Hacker News new | past | comments | ask | show | jobs | submit login
Apple Silicon M1: Black Magic Fuckery (singhkays.com)
1042 points by singhkays 62 days ago | hide | past | favorite | 1083 comments



Not to speak for anyone else, but one thing I gently disagree with:

>Given that Hackintoshers are a particular bunch who don’t take kindly to the Apple-tax[...]

I have zero issues with an Apple premium or paying a lot for hardware. I think a major generator of interest in hackintoshes has been that there are significant segments of computing that Apple has simply completely (or nearly completely) given up on, including essentially any non-AIO desktop system above the Mini. At one point they had quite competitive PowerMacs and then Mac Pros covering the range of $2k all the way up to $10k+, and while sure there was some premium there was feature coverage, and they got regular yearly updates. They were "boring", but in the best way. There didn't need to be anything exciting about them. The prices did steadily inch upward, but far more critically sometime between 2010 and 2012 somebody at Apple decided the MP had to be exciting or something and created the Mac Cube 2, except this time to force it by eliminating the MP entirely. And it was complete shit, and to zero surprise never got a single update (since they totally fucked the power/thermal envelope, there was nowhere to go) and users completely lost the ability to make up for that. And then that was it, for 6 years. Then they did a kind of sort of ok update, but at a bad point given that Intel was collapsing, and forcing in some of their consumer design in ways that really hurt the value.

The hackintosh, particularly virtualized ones in my opinion (running macOS under ESXi deals with a ton of the regular problem spots), has helped fill that hole as frankenstein MP 2010s finally hit their limits. I'm sure Apple Silicon will be great for a range of systems, but it won't help in areas that Apple just organizationally doesn't care about/doesn't have the bandwidth for because that's not a technology problem. So I'm a bit pessimistic/whistful about that particular area, even though it'll be a long time before the axe completely falls on it. It'll be fantastic and it's exciting to see the return of more experimentation in silicon, but at the same time it was a nice dream for a decade or so to be able to freely take advantage of a range of hardware the PC market offered which filled holes Apple couldn't.


Apple does not want to offer to the hackintosh/enthusiast market because they are the most price conscious segment. Targeting that segment means putting out extremely performant, low-margin commodity machines. Doing so then cannibalizes the market for their ultra-high-end stuff.

Not only that, though. Enthusiasts are also extremely fickle and quick to jump ship to a cheaper hardware offering. If you look at all of Apple’s other markets, you’ll see loads of brand loyalty. Fickle enthusiasts don’t fit the mould.


When Apple first mandated kext signing (Mountain Lion?) they explicitly whitelisted certain community built kexts used for Hackintosh. IMO Apple and the Hackintosh community has been mutually benefited until now. Many who have accustomed to macOS from Hackintosh has eventually invested in Apple products.

Considering Apple has only went after those who profiteered by selling pre-built Hackintosh and not everyone who are profiteering from Hackintosh scene; I would say Apple did care about the Hackintosh community in some way.

I thought the higher performance/price Hackintosh, especially with Ryzen might force Apple to act differently but now with M1, Apple needn't worry about Hackintosh performance/price anymore.


> Apple does not want to offer to the hackintosh/enthusiast market because they are the most price conscious segment. Targeting that segment means putting out extremely performant, low-margin commodity machines. Doing so then cannibalizes the market for their ultra-high-end stuff.

Looking over the shoulder at a 64-core Threadripper with 256GB of ECC RAM, 3090FE, Titan RTX and Radeon VII, yeah right. Some of us do Hackintoshing because we want more dope specs than what Apple offers and customizability that comes with PC hardware.


What Apple could have done is to continue supplying something like the G4 towers. Those where stunningly beautiful machines and practical.


What if they simply decided that they didn't care for that part of the market? At some point we should just accept that.


Sure, but there's a legit gap in the datacenter— not having a sanely, legally rackable OS X machine is a pretty big problem for a lot of organizations. Not everyone wants to do their Jenkins builds or generate homebrew bottles on a Mac Mini under someone's desk.


Is this really an issue? They sell shelves that let you rack 2 Mac Minis in 1U space. You can also buy a rack mount Mac Pro if you want to spend really big bucks.


Isn’t the newest Mac Pro available in rack form?


Ah, so it is, and the thermal story there is definitely much better than with the Mini, there being a clear intake/exhaust flow. OTOH, there's still likely a gap in terms of management features, and the starting price of $6.5k for a 4U system is definitely going to be a barrier for some. Good to know there's at least something, anyway.


None of those use cases seem relevant for iOS or macOS development.


Surely you'd want CI builds for your app? I suppose you can always go the sassy option and just offload this problem onto Travis or CircleCI, but then they're the ones stuck figuring out how to rack thousands of Mac Minis, dealing with thermals in a machine that isn't set up for hot/cold aisles, a computer that doesn't have a serial port or dedicated management interface, etc.

If you're a big enough org or the app is for internal use, this might not be an option anyway. At that point I imagine most people just give up on it and figure out how to run macOS on a generic VM. But at that point you have to convince your IT department that it's worth it doing a thing that is definitely unsupported and in violation of the TOS.

Or maybe some of these are big enough that they are able to approach Apple and get a special license for N concurrent instances of macOS running on virtualized hardware? Who knows.


No company on the planet is big enough for Apple to make exceptions like that. All of them either use a cloud provider or a custom rack design just for Mac Minis.


Companies like Google or Microsoft aren't big enough? Google's Chrome and Microsoft Office alone I would wager are more than big or popular enough to get special treatment

Adobe is smaller by contrast but I'd speculate has a much deeper relationship with Apple as well


All of them use Mac Minis as far as I know.


Nope, just build straight from XCode.


Well sure, for a single person team. But as soon as you're working with other people, surely you want an independent machine making builds and running tests— this is literally item 3 on the Joel Test.


You would be surprised how some teams actually develop code, even timesharing iMacs among teams.


I wonder what the overlap is between those teams who do not invest in their infrastructure, and those who ship broken products.


If you ever get a chance to meet employees at CircleCI or some other CI provider at a conference after Covid is over, consider asking them about how they rack Mac Minis.


Good for them, I just use XCode.


Ah yes, how could they have been so blind. They should have just put XCode in the server racks.


What servers?


The entire thread is about running OSX in data centers. In data centers you run servers if you didn't notice.


pjmlp's view appears to be that because their customers, who are not experts, don't know enough to ask for continuously tested software, they don't believe it is their professional responsibility to provide that either. This allows them to dismiss any complaints about macOS in datacenters as irrelevant.


On the contrary, testing doesn't come for free, and everyone gets what they care to pay for.


I'm not a consultant, but I believe it would be an ethical failing on my part to hand someone else a piece of code without extensive, automated testing and CI.


You have 80 hours budget to deliver X features, no compromise or no pay, feel free to decide how to deal with testing.


Well, thank you for providing the first compelling argument as to why software practices need to be more formally regulated. Providing CI/CD should be the industry norm and expected default.


macOS is not a server OS and X serve has long stop being an option.


I can't tell if you're being facetious or you just don't care about automated testing and continuous integration for your code.


CI/CD is largely ignored in plenty of consulting gigs.

I care to the extent customers care.


So true. I've done a lot of freelance work over the past 20 years. CI/CD has never come up. You're sometimes lucky if you can even set up a test system / site.


I pity your customers.


Their choice, no need to pity them.


Apple tried selling Xserves for years.


And to this day they're still the most gorgeous servers ever made, especially the xserve raid.


I don't think the answer is for Apple to force people into buying custom server hardware any more than it is to force them into making janky rack setups for Mac Minis.

The answer that most people would like to see would be a stripped down, non-GUI macOS that's installable at no cost in virtualization environments, or maybe with some evaluation scheme like Windows Server has, which effectively makes it free for throwaway environments like build agents.


> The answer that most people would like to see would be a stripped down, non-GUI macOS that's installable at no cost in virtualization environments

That's called "Darwin" and it's theoretically open source, but there doesn't seem to be a useful distribution of it. Whether that's due to lack of community interest or lack of Apple support is the question.


A useful distribution (for building anyway) would require all the headers and binaries from macOS, which wouldn’t be distributable, right? So you’d have to have enough of a free system to be able to get to the point where that stuff could be slurped out of a legit macOS installation. Sounds like an interesting challenge.


Okay, but offering no machine suitable for developers and power users will eventually hurt them when they leave whole ecosystem.


> offering no machine suitable for developers and power users

This perception strikes me as having warped in from a different decade. Nowadays, at least in my neck of the woods, developers almost universally use laptops, and Apple's still plenty competitive in the (high end) laptop department.

For the most part, the only developers I know who still use desktops are machine learning folks who don't like the cloud and instead keep a Linux tower full of GPUs in a closet somewhere. And then remote into it from a laptop. Half the time it's a MacBook, half the time it's a XPS 13. And they were never going to consider a Mac Pro for their training server, anyway, because CUDA.

I couldn't speak to power users, but my sense is that, while it meant something concrete in the '90s, nowadays it's a term that only comes out when people want to complain about the latest update to Apple's line of computers.


I work in games where we write c++ in a multi-million LOC base. Every developer in my company has a minimum of 12 cores, and 96GB RAM. All of the offices are backed by build farms on top of this. There are entire industries that rely on very high end hardware. (Of course we also rely on lots of windows-only software too, but that's only an issue once the hardware is solved)


Fair, and we could spend ages listing all the different kinds of people who have really specific job descriptions that require them to have traditional, stationary workstations. And then we could follow that up with lists of all the reasons why they need to be running Windows or Linux on said workstations, and couldn't choose comparable Apple hardware even if it were available.

But I don't think that we need to beat a dead horse like that. The more interesting one would be to figure out some interesting and non-trivially-sized cross-section of people who both need a workstation-class computer, and have the option of even considering using OS X for the purpose.


The main reasons to buy Apple x86 machines for any OS developer was that Apple has to keep their number of hardware variants to a minimum and you can run compatibility (and truly same hardware performance) tests against any OS as OsX was the only one locked to it's hardware. The same might be true for Arm if there are adequate GPL drivers to not exclude Linux/Android, etc.


I'm not sure that's true. At least in my experience, Bootcamp seemed almost designed to cripple Windows by contrast to OS X

The last time I used it (the last MBP with Ethernet built in. I want to say 2012 or 2013?) some of the features "missing" in Bootcamp

- No EFI booting. Instead we emulate a (very buggy!) BIOS

- No GPU switching. Only the hot and power hungry AMD GPU is exposed and enabled

- Minimal power and cooling management. Building Gentoo in a VM got the system up to a recorded 117 degrees Celsius in Speccy!

- Hard disk in IDE mode only, not SATA! Unless you booted up OS X and ran some dd commands on the partition table to "trick" it into running as a SATA mode disk

The absolute, crushing cynic in me has always felt that this was a series of intentional steps. Both a "minimum viable engineering effort" and a subtle way to simply make Windows seem "worse" by showing it performing worse on a (forgive the pun) "Apples to Apples" configuration. After all, Macs are "just Intel PC's inside!" so if Windows runs worse, clearly that's a fault of bad software rather than subtly crippled hardware


I think we used rEFIt.. I remember it would be a bit finicky, but I never really had to boot windows since my product had no equivalent, and these days I don't boot OsX, though firmware updates would be nice.


What if Apple decided that they don't get to gain that much out of AAA games so they don't care offering hardware that those companies might ran on?

I have the feeling that Apple just cares about Apps for iOS (money wise). What's the minimum they need to do so people write iOS apps?

If this hardware, incidentally, is good for your use case, all is good. If not, they might just shrug it and decide you're too niche (i.e. not adding too much value to their ecosystem) and abandon you.


Yes, I think they view the basic mid-range tower box as a nearly-extinct form. Like corded telephones & CRTs.

They choose to make the mac pro as some kind of halo product, I guess. But really the slice of people who need more power than an iMac, and less than this "Linux tower full of GPUs" or a render farm, they judge to be very small indeed. This wasn't true in the 90s, when laptops (and super-slim desktops) came with much bigger compromises.


I don't think they think it's a small market; I think they think it's a commoditized market with very thin margins. That form-factor has a literal thousand+ integrators building for it, and also in many segments (e.g. gaming) people build their own to save even more money. Those aren't the sort of people who are easily swayed to pay an extra $200+ of pure margin in exchange for "integration" and Genius Bar "serviceability" (the latter of which they could mostly do themselves given the form-factor.)


I guess people into hot-rodding, especially for games, have never been Apple's target. (Even if they are numerous, and I actually have no idea how large this segment is.) Besides price-sensitivity, wouldn't they be bored if there were only 3 choices? Maybe we will find out when the M2-xs or whatever arrives.


> suitable for developers and power users

Important to note 'some' here. I'm a developer and power user, and haven't had a desktop computer in almost 10 years.


Me too. I last had a desktop, at work, about that long ago, and have not bought a desktop computer for myself in a lot longer. Laptops got very good and I can still plug it into a monitor and external controllers when I need to. I don’t need a server at home because of the cloud and broadband.


> Okay, but offering no machine suitable for developers

You mean... software developers? The same people who almost universally use a Mac?


There's a massive US centric bubble when it comes to Apple. iPhones and Macbooks are not in the majority, let alone universal, with software developers as whole, just in pockets


The five dollar latte crowd is willing to pay and consume. Walk into any café and good luck finding non-Apple machines. (Occasionally there will be a Surface or two, especially if you live in Seattle).


They are most visible but that does not mean they are most important part of ecosystem. Plus, in 5 years they will be reconsidering workplace setup due to back pain and/or carpal. And Apple asks arm and leg for all ergonomic accessories like external monitor and dock.


> Plus, in 5 years they will be reconsidering workplace setup due to back pain and/or carpal.

This is where I'm at.

I don't know if other people are built from sturdier stuff than me or what, but typing on a laptop to any significant extent leaves me with tendonitis for several days. And staring at a laptop screen too long leaves me with neck pain.

Laptops are a nightmare in terms of ergonomics.

It's been a bit of a blessing for me because I only have a laptop at home, and it basically means I can't take work home with me.

But I'm pretty seriously considering upgrading to a traditional desktop sometime in the next year.


Laptops are my ergonomic savior. I make sure it's on my lap, and that my elbows are on softly padded armrests and hang down gently, and this has given me decades of work after fierce carpal tunnel inflammation.

I also use a Wacom tablet comfortably placed on a table to my right.


You just... buy from someone else? You don't have to buy an external monitor or a dock from Apple.


Sure - so now they're unsuitable for developers?


> The same people who almost universally use a Mac?

This has become steadily less true since about 2012, in my experience. I don’t know any full time developers still using an Apple laptop. The keyboard situation caused a lot of attrition. I finally stopped support for all Apple hardware at my company months ago, simply to get it out my headspace. Will Fusion360 again be completely broken by an Apple OS update? Am I going to have to invest time making our Qt and PyQt applications work, yet again, after an Apple update? Are Apple filesystem snapshots yet again going to prove totally defective? The answer is “no”, because we really need to focus on filling customer orders, do we’re done with Apple. ZFS snapshots function correctly. HP laptop keyboards work ok. Arch Linux and Windows 10 (with shutup10 and mass updates a few times per year) get the job done without getting in my face every god damned day.


> I don’t know any full time developers still using an Apple laptop.

Fascinating. I can name a few startups in my town that use Apple. One just IPO'd (Root), another is about to (Upstart). There are others as well.

The big companies it's hit or miss. Depends on if they are working on big enterprise applications or mobile/web. Mobile and web teams are all on MacBook Pros, and the big app dev teams aren't.

When I was last in Mountain View they were on Mac as well but I know that depends on personal preference.


>The same people who almost universally use a Mac?

* in very specific places and conditions.

Actual numbers from every single credible survey puts macs at a grand maximum of 25%.


Well, most corporations don't give developers a choice in what computer they use. I doubt that makes them unsuitable.


Macs keep dropping off our domain, so there's no real way to maintain their provisioned state.


I use a mac because other developers do in my office. But I'd be just as productive on a linux or windows machine.

For a while osx had the edge because it had a nice interface while still offering a lot of unix. Now windows and linux has caught up in the areas they were lacking before. Meanwhile apple has been caring less and less about people using the cli.


Quite possibly - I was a huge Apple fan who's now using a PC because I was fed up with the lack of viable options for me.


Apple will certainly offer an ARM-based MacPro, but I'm assuming it'll be a very different beast - current one maxes out at 1.5TB of RAM and it doesn't seem likely anyone will integrate that much memory on a chip anytime soon ;-)

Memory bandwidth is one key feature impacting M1's performance. When Apple builds an ARM-based MacPro, we can expect something with at the very least 5 DDR5 channels per socket. It's clear, from this, the M1 is a laptop/AIO/compact-desktop chip.


The M1 already has 8 LPDDR4x channels per socket, running at 4266MHz.


My bad. I was looking at the specs. It's 300GBps, which is roughly 5x DDR5 IIRC.

So yes, at the very least 8 DDR4 channels, or one per core, but I'd expect more from a workstation-class board.

Now, speaking of the board, all those memory channels will be funny.


Funny? 8 channels is the standard AMD Epyc socket. Most threadrippers (AMD's workstation chip) are 4 channel, but there is a variant that's 8 channel.


I would expect more, so that cores don't get memory starved. The M1 has 4 fast cores and 4 slow ones. If we imagine an M2 with 8 fast cores, I would expect it to need 16 channels to have the same performance. That's a lot.


Dunno, the M1 CPU package is tiny, thin, power efficient, etc. It's got 4 memory chips inside the package. I don't see any particular reason why a slightly larger package could have 4 memory chips on one side, and 4 chips on the other to double the memory bandwidth and memory size.

However the M1 is already pretty large (16B transistors), upgrading to 8 fast cores is going to significantly increase that. Maybe they will just go to a dual CPU configuration which would double the cores, memory bandwidth, and total ram.

Or move the GPU and ML accelerator offchip.


I'm a developer and haven't had a desktop in 15 years or so. It's been a mix of Thinkpads (IBM then Lenovo) and MBPs.

I'm guessing very few developers need the extra power a desktop offers over a high-end laptop.


even a low end laptop. i work in clojure for finance. digital nomad. thinkpad x220.


I think the developers and power users that still use desktop machines/towers are either very cpu-power-hungry niche exception, or the more backwards ones, and thus least likely to influence/be imitated by anyone...


I care to differ (as a developer on a desktop). The reason for developing on a desktop is that my productivity is much higher with 3 screens, one of which is a 40 inch, a full 101 key keyboard and a mouse.


> The reason for developing on a desktop is that my productivity is much higher with 3 screens

Those requirements don’t dictate a desktop[0]. Also, the physical size of the monitor is irrelevant, it’s the resolution that matters. Your video card doesn’t care if you have a 40” 4K monitor or an 80” 4K monitor, to it, it’s the same load.

The reason I still have a cheese grater Mac Pro desktop at all is because I have 128gb RAM in it and have tasks that need that much memory.

[0] I’ve connected eight external monitors to my 16” MBP (with laptop screen still enabled, so 9 screens total). I don’t use the setup actively, did it as a test, but it very much works. The setup was as follows:

TB#1 - 27” LG 5K @ 5120x2880

TB#2 - TB3<->TB2 adapter, then two 27” Apple Thunderbolt Displays @ 2560x1440

TB#3 - eGPU with AMD RX580, then two 34” ultrawides connected over HDMI @ 3440x1440, two 27” DisplayPort monitors @ 2560x1440

TB#4 - TB3<->TB2 adapter, then 27” Apple Thunderbolt Display @ 2560x1440

So that’s almost 50 million pixels displayed on around 4,000 square inches of screens driven by a single MBP laptop.


How was it to move (or find) the cursor?

(I kid, I kid)


You kid, but it legit was an issue. I’ve used at least 3 monitors (if not 1-2 more) for over a decade now, so I’ve experience there, but going up to 9 even for a short while, it was definitely an issue.


Yeah I’m with you. Laptops are great, but they sacrifice a lot for the form factor. Remove the constraint of needing an integrated screen, keyboard, touch pad and battery, and you can do much more. Sure you can dock it, but docked accessories are always second class citizens relative to the integrated stuff.


All of which are available on modern laptops


Laptop user, I also have 3 screens. I do use the MBP's keyboard, but never felt like that cost me productivity. I use a normal mouse as well. The only reason I can think of the need a desktop is the extra CPU/GPU capacity you can get.


> The only reason I can think of the need a desktop is the extra CPU/GPU capacity you can get.

Or RAM


Or internal peripherals. If I want 20Tb of storage, and I don't want external chassis all over the place, I need a desktop with at least a couple of 3.5 bays.


You mean you don't like paying $500 for 8GB of soldered RAM?


Nope, not what I’m saying at all (in part because your comment is hyperbolic and untrue). Some folks need more than 64gb RAM which is the highest amount most laptops have.


He is not that off. Apple asks $200 for 8 GB, so he is at the same order of magnitude. For comparison, I've bought this week 16 GB DDR4 ECC (unregistered) sticks for 67 EUR per piece (before VAT).


Great, so you bought an different type of RAM in a completely different form factor and paid a different price. This is on “processor package” RAM and will thus have an entirely different price basis than a removable stick would, not even factoring in the Apple Tax.

Furthermore, how is that relevant to the point _I_ was making about needing more than 64gb of RAM? If you both want to tangent, fine do so, but don’t try to put words in my mouth while doing it.


> Great, so you bought an different type of RAM in a completely different form factor and paid a different price.

It is being called "using an example" or "illustrative example". For comparison, I've used a type of RAM that is traditionally much more expensive than you find in laptops.

> This is on “processor package” RAM and will thus have an entirely different price basis than a removable stick would,

No.

1) The same price is being asked for RAM in non-M1 models.

2) You could put any price tag you want, because the item is single-sourced, the vendor can pull a quote out of the thin air and you cannot find exact equivalent on the market. Therefore, for comparison, a functionally and parametric similar item is being used.

> how is that relevant to the point _I_ was making about needing more than 64gb of RAM?

You get a different product, that supports more RAM.

> If you both want to tangent, fine do so, but don’t try to put words in my mouth while doing it.

Could you point out, where I did that? I was pointing out, that your note about the GP being hyperbolic is untrue - he was in the ballpark.


> I was pointing out, that your note about the GP being hyperbolic is untrue - he was in the ballpark.

Essentially as in the ballpark as $80 is, both are off by 2.5x. Claiming they are “same order of magnitude, so it’s not hyperbolic” is laughable. $100k and $250k are both same order of magnitude, but are radically different prices, no?


at work, when at the office, they are always pushing screens on us. keep thinking is some pork deal with dell. my whole team either plugs in a laptop to one screen, or just works straight on the laptop. maybe we're not cool.


A quick Google will turn up several serious usability studies that show more screen real estate == higher productivity. It depends a lot on the type of work, of course, but for development a larger screen would mean less scrolling and tab switching => less context switching => so your brain gets more done.


Its probably the ergonomics police.


Or ...they're old and cant see the tiny laptop screen or get back pain when using a laptop all hunched over. To be honest, I don't know how anyone does serious work on them.


You can connect a laptop to 2-3-4 externals screens. Which many do. You don't need a tower for that.


Apple itself is selling it's new chips as making faster devices. If only a niche want that speed, Apple probably wouldn't be pushing it as part of the pitch so hard.


That would be if everything else was equal. Everything else is NOT equal. People also want portability, small size, battery life, etc.

If more than a niche had speed as its sole priority, then they would already use desktops, but most (80%+) use laptops today.

But of the majority that uses laptops, most would like a faster machine. Just would prefer it was also a laptop.


Considering how much the gaming side of the PC market will drop on just a single card I am more of the opinion that Apple chose to avoid this market because it did not want the association with gaming as if that were beneath their machines.

At times there seemed to be a real disdain for the people who loved upgrading their machines as well as those who gamed on them. Apple's products were not meant to be improved by anyone other than Apple and you don't sully them with games. The Mac Pro seems to be the ultimate expression of "You are not worthy" from the base system which was priced beyond reason to the monitor and stand. It was the declaration of, "fine, if you want to play then it will cost you" because they didn't really care about the enthusiast of the wrong use - games and such.


Why do they highlight games at every WWDC and product announcement?


To let the fans have a bathroom break.


Being “fickle” is kind of hard to apply to a market segment, because it not a synchronized monolith. There clearly is demand for decent machines hackingtoshes make. It’s just that the mobile market is much higher ROI. So, any R&D other Apple products get are some coincidental opportunities. This entire M1 change is a happy accident.

So it’s not that hackingtosh builders are anything, at all, it’s that they’re outnumbered by iPhone buys 1 to a million.


They certainly wouldn't want to scare away developers, others have suffered greatly due to neglecting them and developers seems to be getting rarer. You always want the enthusiasts and of course they buy new hardware to look for ways how they can make it work for them. Many devs also have a high income so that price isn't as important anymore.


A big part of saving Apple was Jobs killing the clone program. That lesson probably still resonates in the halls of Apple even if allowing hackintoshes is a different thing without the same risks.


I would be extraordinarily grateful for some insight into why this comment was downvoted.


I think the parent comment is just completely ignoring the argument of the post they reply to.

Just looking at the first sentences:

GP: > I have zero issues with an Apple premium or paying a lot for hardware.

parent: > the hackintosh/enthusiast market [...] are the most price conscious segment


Then buy the new Mac Pro? I don't understand why that's not an option for GP.


Because as the sibling comments point out, the price of a Mac Pro isn't just an "Apple Tax^WPremium" over a desktop machine but is an order of magnitude more expensive (assuming you don't care about workstation-class components, i.e. Xeon Ws, Radeon Pro GPUs and ECC RAM).

There's an enormous price gap between a Mac Mini and the Mac Pro (especially when the Mini now has higher single-threaded performance than the base Pro...) which Apple has widened in the last decade or two.


I've had a continuous string of Mac Pros from G3 to 2012 (MacPro 5,1) my main workhorse. I have continually updated and expanded it.

The 2013 mac pro was a mess. pass.

The latest mac pro... I think it wasn't just expensive, it was sort of sucker expensive.


> The 2013 mac pro was a mess.

I appreciate that the 2013 mac pro wasn't for you, but it was perfect for me: small but powerful. Firstly: RAM. I was able to install 64 GiB on it, which enabled me to run Cloud Foundry on ESXi on Virtual Workstation on macOS. Non-Xeon chipsets maxed-out at (IIRC) 16 GiB and then later 32 GiB—not enough.

Secondly, size & esthetics: it fits on my very small console table that I use as a desk. I have a modest apartment in San Francisco, and my living room is my office, and although I had a mini-tower in my living room, I didn't like the looks.

Third, expandability: I was able to upgrade the RAM to 64 GiB, the SSD to 1 TB. I was able to upgrade the monitor to 4k. It has 6 Thunderbolt connections.

My biggest surprise was how long it has lasted: I typically rollover my laptops every year or so, but this desktop? It's been able to do everything I've needed it to do for the last 7 years, so I continue to use it.

[edited for grammar]


While the form factor was cool, how pissed would you have been if it broke and you were buying the exact same machine, for the same price (give or take), in 2018?

Part of the "mess", I'd argue, was that Apple backed themselves into a thermal corner where they couldn't update the machine but also wouldn't cut its price so it got steadily worse value as time wore on.


> but also wouldn't cut its price so it got steadily worse value as time wore on

This has long been an issue for Apple products. It's why the best time to buy an Apple product is right after an update.


You're not wrong but the Mac Pro went a particularly long time between updates.


Oh, definitely. Look at the Apple TVs for another example. In both cases, if Apple would drop the price, even just yearly, they would sell so many more units.


But my workhorse has had so many upgrades. Lots of storage in and out. I have a bunch of drive sleds. I updated the graphics card more than once. Presently it has 2x6 core, 5 ssds (one in a pcie slot), a 10tb hard disk, a pcie usb3 card, and a gtx980.


I just got a new Mac Pro. The only real upgrade I did from Apple was to the 12 core Xeon. Other than that I kept the base 32GB memory, though I did get a 1TB SSD from the 256GB base offering.

... then I went to NewEgg and got 192GB of memory for $800ish, rather than Apple's exorbitant $3,000. And seriously, why? Same manufacturer, same specs. And convenience factor? It took a good 45 seconds to install the memory, and I'd wager anyone could do it (it's on the 'underside' of the motherboard, all by itself, and has a little chart on the memory cover to tell you exactly what slots to use based on how many modules you have).

And then I bought a 4x M.2 PCIe card and populated it with 2TB SSDs (that exceed the Apple, with sustained R/W of 4500MB/s according to Blackmagic) for just around $1,100, versus the $2,000 Apple wanted. Only downside is that it cannot be the boot drive (or maybe it can, but it can't be the _only_ drive).


> The latest mac pro... I think it wasn't just expensive, it was sort of sucker expensive

It's the kind of Mac that makes you get an iMac to put on your desk and a beefy Linux server-grade box you hide somewhere, but that does all your heavy lifting.


Yep, thats what I did.


Some tools and OSs make it easier than others. I used to do a lot of work from my IBM 43P AIX workstation (great graphics card, huge monitor, model M keyboard) that actually ran on a more mundane Xeon downstairs. X made it even practical to browse the web on the 43P. It attracted some really confused looks in the office.


Exactly this. The closest to this would be an i7 iMac but not everyone wants an Aio PC. It’s kind of a bummer. We finally have an iPhone for everyone, even a high end small form factor option. Whoever is responsible for that decision please take a look at the Mac lineup next.


There's even precedent for it: the iMac/iMac Pro. The Pro model has workstation-class hardware in it while the non-Pro does not.

Ideally the enhanced cooling from the Pro models would trickle down to the non-Pro. By all reports the (i)Mac Pro is virtually silent but in the low-power ARM world a desktop machine that size could almost be passively cooled, even under load.


Give them time maybe?

I bet Apple would love to release an all-in-one iMac Pro powered by an iteration on the M1. They could put a Dolby Vision 8k display in it and drag race against Threadripper machines for UHD video workloads.


Part of me laughs that an ARM chip could compete with a Threadripper, the other part of me seriously thinks it could happen.


I mean, the iMac Pro came out in 2017 and there isn't much sign of anything trickling down to the standard iMac. Rumour is that the ARM Mac Pro will be significantly smaller than the Intel one - it'll be interesting to see how (or if) they support discrete GPUs.


I don't totally agree with GP, but I think their global point was that during a long time (all the 2010s ?) there was just no decent Mac Pro.

Outside of the Mac mini, the most powerful desktop machine was actually iMacs, with all the compromises that come with the form factor, and the trashcan Mac Pro who was thermally constrained.

In that period, no amount of money would have helped to get peak storage + network + graphic performance for instance.

We are now in a slightly better place where as you point out, throwing insane amounts of money towards Apple solves most of these issues. Except for those who don't want a T2 chip, or need an open bootloader.


Agreed. Do not see anything worth downvoting at all.


Agree completely.

I don’t know that the “Apple tax” moniker is really fair anymore, either.

The machines have always commanded a premium for things that enthusiasts don’t see value in (I.e. anything beyond numeric spec sheet values), so most critics completely miss the point of them.

There’s a valid argument to be made that they’re also marked up to higher margins than rivals even beyond the above, but I’m not sure if any end user has really ever eaten that cost - If you buy a MacBook, there has always been someone (students) to buy it back again 3/5/10 years down the road for a significant chunk of it’s original outlay. That doesn’t happen with any other laptop - they’re essentially scrap (or worth next to nothing) within 5 years. After 10 years I might actually expect the value to be static or even increase for its collector value (e.g. clamshell iBook G3s)

The total cost of ownership for Apple products is actually lower over three years than any rival products I’m aware of.


> The machines have always commanded a premium for things that enthusiasts don’t see value in (I.e. anything beyond numeric spec sheet values), so most critics completely miss the point of them.

It's not just intangibles. I really like using Macs, but my latest computer is a Dell XPS 17. This is not a cheap computer if you get the 4k screen, 64GB of RAM and the good graphics card. At those prices, you should consider the MBP16. The MBP is better built, has a better finish and just feels nicer.

Thing is, Dell will sell me an XPS 17 with a shitty screen because I don't care about the difference and would rather optimise battery life. I can get 3rd party RAM and SSDs. I can get a lesser graphics card because I don't need that either. I can get a more recent Intel CPU. And I can get the lesser model with a greater than 25% discount (they wouldn't sell me the better models with a discount though).

I think some of the Apple Tax, is them not willing you sell you a machine closer to your needs, not allowing some user replaceable parts and not having discounts.


It works both ways: if you get something in Apple hardware, you will get the nice version of it. If can't get something there, you will have to be without.

Example: I've been looking at X1 Nano. It is improvement compared to other lines (it has 16:10 display finally!), but it is still somewhere in the middle of the road.

The competitor from Apple has slightly better display, much better wifi and no option for LTE/5G.

Nano has 2160x1350 450 nits display with Dolby Vision. Apple has 2560x1600 400 (Air)/500 (MBP) nits display with P3. The slightly higher resolution means that Apple would display 9 logical bits using 8 physical when using the 1440x900@2X resolution (177% scale), but to get similar scale on Nano that would mean displaying 8 logical pixels using 6 physical (150% scale). Similarly, the Dolby Vision is an uknown (how it could get used?), the P3 from Apple is a known.

X1 Nano has 2x2 MIMO wifi - Intel AX 200 - with no option for anything better. There are only two antennas in the display frame, you cannot add more (ok, 3, but the third one is for cellular, and cannot be used for wifi if you forego cellular). Apple ships with 4x4 MIMO. If you have decent AP at office or home, it is a huge difference, yet no PC vendors are willing to improve here.

The cellular situation is the exact opposite. You can get cellular module for Thinkpads, and you cannot for Apple, at all, so if you go this route, you have to live with workarounds.


Yes and no. To be honest I did the same back-of-the-napkin math that you did prior to buying my MBP - the thing is the TCO is even worse if you customise the machine.

Example - a Mac is a Mac for resale purposes - if I attempt to later sell an XPS that I've opened up and put an SSD in and a couple of SODIMMS - I now need to recoup my cost on all of those things. The problem is that if someone is looking at a used XPS with upgraded SSD and upgraded RAM they're statistically unlikely to fully investigate and value the (probably really good) parts that you upgraded it with - they're just going to see X,Y,Z numbers and price accordingly.

Generally though, a 5 year old Windows laptop with 16GB RAM still commands the value of a 5 year old Windows laptop as best I could tell looking at resale values.


I wasn’t trying to address the resale value. Only the tax part. The perception of the tax comes from Apple simply not offering compromised parts for a particular set of parameters. And other manufacturers willing to sell at large discounts regularly!


Can’t argue with this. They gouge absurdly on storage and memory.


> I don’t know that the “Apple tax” moniker is really fair anymore, either.

I think it's still accurate and honestly that's apple's business model.

I think the resale value for a student macbook doesn't really matter. It still costs the student - while they are poor - as much as 4x what other students pay for their laptop. Many students are paying $250 for their laptop.


I threw out a laptop I had from 2008 that was at the time top-of-the-line 3000$. I bought it in the US when I was there on vacation. This was when the dollar was at such a low that I got the device for at the time the equivalent of something like 1200€.

I couldn't sell that device for half of that a year and a half later. I got a newer laptop in 2016, again very specced out for a laptop. About 1800€, couldn't sell it for 800€ 2 years later. I still use that last one because I didn't want to sell it so far under what the market value should be.

If you try to sell anything Apple related that isn't more than 5 years old you won't have that problem at all. You can get a good value for the device and sell it without too much of a hassle.

Even if you're a student you would likely be better off buying the cheapest macbook you can find (refurbished or second hand if needed). If you don't like the OS you can just install Windows or a Linux distro on it.

Are you sure about that 250$ number? Because I don't think that's a very realistic number.


> Are you sure about that 250$ number? Because I don't think that's a very realistic number.

For note taking, word processing, basic image editing, web browsing, video playing, etc, you can easily get a capable enough laptop for that price.

This is not comparing like-for-like in terms of what the machines can do, of course. Apple's range doesn't even remotely try to cover that part of the market so a direct comparison is unfair if you are considering absolute price/capability of the devices irrespective of the user's requirements, but for work that doesn't involve significant computation that bargain-basement unit may adequately do everything many people need it to do (assuming they don't plan to also use it for modern gaming in non-working hours).

> If you try to sell anything Apple related that isn't more than 5 years old you won't have that problem at all.

Most people don't consider the resale value of a machine when they buy it. For that to be a fair comparison you have to factor in the chance of it being in good condition after a couple of year's use (this will vary a lot from person to person) and the cost of any upgrades & repairs needed in that time (again more expensive for Apple products by my understanding).

And if you buy a $500 laptop and hand it down or bin it, then you are still better off (assuming you don't need a powerful machine) than if you dropped $3,000 for an iDevice and later sold it for $2,000.

> what the market value should* be.*

"Market value" is decided by what the market will bare, not what we want to be able to sell things for, and new & second hand are often very different markets.


> Are you sure about that 250$ number? Because I don't think that's a very realistic number.

I'm not a student but it's pretty close I think.

I invested $350 into a Chromebook that runs native Linux[0] about 4 years ago and it's still going strong as a secondary machine I use when I'm away from my main workstation.

It has a 13" 1080p IPS display, 4gb of memory, an SSD, a good keyboard and weighs 2.9 pounds. It's nothing to write home about but it's quite speedy to do every day tasks and it's even ok for programming where I'm running decently sized Flask, Rails and Phoenix apps on it through Docker.

If I had to use it as my primary development machine for web dev I wouldn't be too disappointed. It only starts falling apart if you need to do anything memory intensive like run some containers while also running VMs, but you could always spend a little more and get 8gb of memory to fix that problem.

I'm sure nowadays (almost 5 years later) you could get better specs for the same price.

[0]: https://nickjanetakis.com/blog/transform-a-toshiba-chromeboo...


Right, except that when you stop using the Chromebook every day or move on to something better, will it have residual value or just go to landfill?

I love Chromebooks, don't get me wrong, but the problem I've come to realise over time is that many are specced and priced just about at a point where they'll quickly move into obsolescence not long after purchase - at which point the only thing keeping them out of the ground is your willingness to tolerate them after the updates have stopped.

The Mac will still be worth a good chunk of money to someone.

I have a Chromebook Flip here that I adored for several years that I couldn't give away now.


To answer your question accurately will depend on how long it ends up lasting for.

For example if it works well enough for another 4 years, now we need to ask the question on whether or not you could get reasonable value out of an 8+ year old Mac. I never sold one so I'm not sure. My gut tells me it's going to be worth way less than what you bought it for even if it's in good condition.

But more generally, yeah I have no intentions on re-selling this thing if I decide I'm done with it before it physically breaks. I'd probably donate it or give it away for free (if someone wanted it).

I don't see that as too bad tho. If I can get 7-8 years out of $350 device I'm pretty happy, especially if the next one costs about the same.

It's a tough comparison tho because a decently decked out MBP is going to be like 8x as expensive but also have way better specs.


I think it is realistic. It's easy to think people have money to spend on $999 laptop when you are living in a first world country. 90% of the world probably couldn't afford that.


> Are you sure about that 250$ number? Because I don't think that's a very realistic number.

I think it's fairly realistic. The Dell Latitude 7250 is probably a good representative of what you can get used for ~$220-$300 US these days: https://www.ebay.com/sch/i.html?_from=R40&_trksid=p2380057.m... The dual-core processor should still be serviceable for everyday work, at ~1.3kg it's light enough to carry around all day, a 1080p resolution should be OK on a 12" screen, and it can take up to 16GiB of RAM, though holding out for one with 16GiB preinstalled will definitely tend to push the cost up to nearer $300: https://www.ebay.com/sch/i.html?_from=R40&_trksid=p2380057.m...

(Then any laptop with similar specs except with a 2-in-1 form factor tends to cost a fair bit more, but that's not a must-have for most students or anyone who might have been considering a MacBook.)


> Are you sure about that 250$ number? Because I don't think that's a very realistic number.

I got an i7 T430s for 200€ a few years ago and it's still plenty fast for coding, so I don't see why this number wouldn't be realistic.


"I think the resale value for a student macbook doesn't really matter"

It's literally the only thing that matters if you're the seller.

If you have your choice of two items to sell 5 years from now, you ideally want to be selling the item that's worth substantially more to the buyer, rather than trying to sell something worthless.

Assuming there's nothing dishonest happening, it's really up to the market to price.

Thing is, the student buying the MacBook is probably going to be substantially better off that way too, in that it will likely retain proportionally more of it's value from that point too.


> I don’t know that the “Apple tax” moniker is really fair anymore, either.

Apple on the new Mac Pro that I got a month ago: 192GB memory? That will be $3,000. NewEgg? We'll sell you the same specced memory from the same manufacturer for $800. And you get to keep/sell the baseline 32GB memory.

8TB SSD? $2,000, thanks. OWC and NewEgg? Here, have a PCIe 4xM.2 card and 4 2TB SSDs for $1,100. Oh, and they'll be 50% faster, you just can't have them as the only drive on the system (my Apple SSD runs at around 2800MB/s, the alternative, 4500MB/s).

So they are entirely marked up, and look in any forum - by far most people are not doing what I'm doing, and just "going straight Apple for convenience", though the memory installation was less than 1 minute, and the SSD installation less than 5, including unboxing, seating the 4 drives, reinstalling the heatsink on the card and installing. I get "my time is money", and "it just works" (which, as we know, more and more is less the case with Apple), but really, for me, that was a $3,100 savings for <10 minutes effort.


In terms of high performance products, I’m actually really excited for the next Mac Pro. They’ve got novel design options open to them that no rival has.

The M1 costs Apple relatively little to produce per unit - I would expect them to keep the overall design for a Mac Pro but have stacked modules such that the side wall of the Mac Pro is a grid of 4 or more such modules each with co-located memory like the M1 has. Obviously performance would depend upon the application being amenable to a design like that but a 32 or 64 5nm core Mac Pro is not out of the question, and would be impossible to match for performance in the next few years by any Hackintosh.

Even after capacity frees up at TSMC for AMD to move to 5nm, they won’t be able to co-locate memory like the M1 does due to standards compliance with DRAM sticks.

I think the next couple of years will be really turbulent for other vendors - the M1 is likely far more significant for the PC market due to how disruptive it is than it is for the Mac market.


It might force Intel / AMD / Broadcom to get serious about hardware and at least integrate more of the components for notebooks. Maybe not go full on OEM, but a lot more than the CPU, because M1 is probably fundamentally winning with SoC design.

I would like to know if they are using fundamentally better batteries, and how much a 5nm process lead is behind this.

But I will hand it to Apple, if they finally did something to break the 4-8 hour battery life limit, a limit that always seemed to stay the same despite node shrink after node shrink after node shrink, and really about the same on-screen performance for usual browsing/productivity application use.

I was pretty distrustful of the ARM move, but if they deliver this for the Macbook Pro, I'll hop to ARM.

Associated with the CPU people "getting serious" is them pushing an OS, which would have to be Linux. Intel should have done this 20 years ago, at least as leverage to make Windows improve itself.


AMD would be able to do DRAM on package for the lowest wattage "ultrabook" chips, at the cost of producing a very different package for them vs. the bigger laptops that are expected to have upgradable SODIMMs. But I doubt that this "co-location" is that huge for performance. Whatever memory frequency and timings Apple is using are likely easily achievable through the regular mainboard PCB, maybe at the cost of slightly more voltage. DDR4 on desktop is overclockable to crazy levels and that's going through lots of things (CPU package - socket pins - board - slots - DIMMs).

> stacked modules such that the side wall of the Mac Pro is a grid of 4 or more such modules each with co-located memory like the M1 has

Quad or more package NUMA topology?? The latency would absolutely suck.


Why would latency suck? 64-cores are already only beneficial for algorithms which are parallelizable -- with the most common class of parallelizable algorithm being data parallelizable ... So -- shouldnt the hardware and os be able to present the programmer illusion of uniform memory and just automatically arrange for the processing to happen on the compute resources closest to the RAM / move the memory closer to the appropriate compute resource as required?


Yeah, I'm no kernel developer, but I've been replying to anyone saying 'just stick n * M1 in it' that even AMD has been trying to move back to more predictable memory access latency, less NUMA woes.


But in general we're moving toward even less uniform memory, with some of it living on a GPU. NUMA pretended that all memory was the same latency, because C continues to pretend we're on a faster PDP-11, but this seems like a step in the wrong direction as for how high-performance computation is progressing.


> I have zero issues with an Apple premium or paying a lot for hardware.

Especially if the margins allow them to not engage in silliness on the software side of things like violating privacy and serving ads in the OS.

There are certainly places that Apple can be criticized, but I think in these two areas they're acting pretty well.


What I don't understand - Windows 10 Pro comes with tons of pre-installed junk - Candy Crush etc. Just charge what you need to, business don't want this stuff.


Well, large businesses at least get Windows 10 Enterprise, which doesn't come with all of that nonsense. The real shame is that you can't get Windows 10 Enterprise without a volume license.


You can get Win 10 Pro. I don’t remember my system having any of that junk preinstalled


It installed itself on any Win 10 machine I saw (most of which run Pro), except those running Enterprise (or LTSC/LTSB), or an education license.

Er, I heard that junk doesn’t install itself on “Pro for Workstations,” but I’m not certain and even then that’s another hundred dollars more expensive than Pro.

And even Enterprise still comes with a lot of junk that, like, 90% of users won’t need. Windows AR? Paint 3D? And so on… half the things in the start menu of a stock Windows 10 Pro install are either crap like Candy Crush, fluff like 3D viewer, or niche like the Windows AR thing.

The worst part about this is that there’s definitely a middle ground between not including anything and pushing crap on people — both nearly every Linux distro I have ever seen, as well as Apple nail that balance, and to be frank with the App Store or Microsoft Store and such I really don’t see the need to include hardly anything.


You do get obtrusive telemetry (Cortana is a good example), but you avoid Candy Crush et al.

You can disable some of the telemetry during install as well.


Just run any of the reclaim series. Pure gold if you ask me.

https://gist.github.com/alirobe/7f3b34ad89a159e6daa1



You're correct that they aren't preinstalled, but once you connect to the internet they will be downloaded and installed automatically.


You can get a 90 day demo of Windows 10 Enterprise from Microsoft: https://www.microsoft.com/en-us/evalcenter/evaluate-windows-...

Install it in a virtual machine, every 90 days make a new virtual machine from scratch. That or use the secret code to reset the demo days. Enter an Enterprise key when you want to register it for real.


Or just run Linux.


you are kidding, right?


There's 'a lot' (2-3k) and there is 'silly, can never justify unless I'm in some super niche segment in Audio/Video production' (10-20k).


> it won't help in areas that Apple just organizationally doesn't care about/doesn't have the bandwidth for because that's not a technology problem

I would posit that Apple is always going to keep macOS working on some workstation-class hardware, just because that kind of machine is what Apple's software engineers will be using internally, and they need to write macOS software using macOS.

Which means one of two things:

1. If they never release a workstation-class Apple Silicon chip, that'll likely mean that they're still using Intel/AMD chips internally, and so macOS will likely continue to be compiled for Intel indefinitely.

2. If they do design workstation-class Apple Silicon chips for internal use, they may as well also sell the resulting workstation-class machines to people at that point. (Or, to rearrange that statement: they wouldn't make the chips if they didn't intend to commercialize them. Designing and fabbing chips costs too much money!)

Which is to say, whether it be a Hackintosh or an Apple Mac Pro, there's always going to be something to cater to workstation-class users of Apple products — because Apple itself is full of workstation-class users of Apple products.


> I would posit that Apple is always going to keep macOS working on some workstation-class hardware, just because that kind of machine is what Apple's software engineers will be using internally, and they need to write macOS software using macOS.

I hope I'm not out of line here, but this is not what a "workstation" is. "Workstation" actually has a specific meaning in the realm of enterprise computing solutions, and developers do not (generally) use workstations.

A workstation is something that, say, the people at Pixar use, or Industrial Light and Magic. It's an incredibly powerful machine that can handle the most intensive of tasks. Software development is generally not such a task, unless you're frequently re-compiling LLVM from source or something. (And even then, it's a world of difference.)

Apple's software developers, like most software developers who use Apple machines, use MacBook Pros (for the most part). Sometimes Mac Minis if they need multiple test machines, and I'm sure there are some who also have Mac Pros. But overwhelmingly, development is done on laptops that they dock while at work and take home with them after. (This was my experience when I interned there, anyway.)


Apple develops not just macOS, but also application software like Logic and FCPX. The engineers writing that code need to test it on full-scale projects (probably projects on loan to them from companies like Pixar.)

But moreover, changes to foundational macOS libraries can cause regressions in the performance of this type of software, and so macOS developers working on systems like Quartz, hardware developers working on the Neural Engine, etc., also work with these apps and their datasets as regression-test harnesses.

See also: the Microsoft devs who work on DirectX.

All of this testing requires "workstation" hardware. (Or servers, but Apple definitely isn't making server hardware that can run macOS at this point. IIRC, they're instead keeping macOS BSD-ish enough to be able to write software that can be developed on macOS and then deployed on NetBSD.)


Are Pixar and ILM really using workstations as you describe them, or render farms?


"I would posit that Apple is always going to keep macOS working on some workstation-class hardware, just because that kind of machine is what Apple's software engineers will be using internally, and they need to write macOS software using macOS."

I have always hoped that we could rely on that heuristic - that internal Apple usage of their own products would guarantee that certain workflows would be unbroken.

In practice, this has never held up.

Over the past 10-12 years it has been reinforced over and over and over: Apple engineers use single monitor systems with scattered, overlapping windows which they interact with using mousey-mousey-everything and never keyboard shortcuts.

They perform backups of critical files - and manage financial identities - using their mp3 player.

The fact that multiple monitors - and monitor handoff - is broken in fascinating new ways with every version of OSX tells you how Apple folks are (and are not) using their own products.


It sounds like you have an issue with the lack of window snapping keyboard shortcuts that are in Windows 10, as well as iPhone backups happening in iTunes until they moved to finder, and iCloud being connected to iTunes although it's managed in SysPrefs. And you have seen some regressions along with the successive improvements to display management. Is that fair?

If so, what is the connection to professional workflows on macOS?


Yeah, I think it's pretty clear, based on all the bugs with multi-monitor support down to their Mac minis, everyone at Apple must be running iMac Pros, and maybe they're using Sidecar to make their iPad into an extra screen.


If you don't have issues with paying a lot for hardware, why don't you buy Mac Pro?


A machine that wont outperform the new mini $6000

16 cores 32GB of ram least powerful gpu on the list 2GB of storage $8,799.00

28 cores 48GB of ram same gpu 4GB of storage $14,699

MSRP on a 2020 Toyota Corolla $19,600

AMD Ryzen Threadripper 3970X 32-Core 3.7 GHz Socket sTRX4 $2,629.90

Cost of the same basic GPU about $219

Cost of complete system equivalent to the almost car priced mac about 4200.

9k-15k isn't "a lot" its a crazy amount. 15k is 1/4 of the median households income.

Most of planet earth can't sink 6000 into a computer let alone 15000. Under Apple the standard expandable board in a box with room to expand is a category available to 1% of the US and 0.1% of the world.


Exactly this, I was fine with the Mac Pro until 2012, it was a little more expensive than PCs but not much outrageously so (maybe 30% more, that's a tax I was fine paying given the OS and that it was quite a well built machine).

The new Mac Pro is 3-4x the price of a machine built around AMD having equivalent performances. I'm building a Threadripper for exactly this reason. Most of the issue is Intel vs AMD and the fact that AMD's Threadrippers are an amazing deal when it comes to performance per dollar and that Apple has an aversion to offering decent GPUs


Yes, it is not a commodity machine, but it sort of goes beyond expensive to almost insulting.

If it was 1/2 the price I think it could fairly be called premium priced.


There is also the iMac and iMac Pro range. The Mac Pro is clearly demarcated as a "money is no object" product.


I'd be much more interested in the iMac if it wasn't built inside a monitor.


The M1 was released on the MacBook Air, MacBook Pro, and Mac Mini.

Buying an iMac now would seem to be a poor decision.

From what I'm seeing in some of the comments, people are so lost in the history of the past years of Apple being the pooch ridden from behind on performance, that they can't get their heads out of their arses to see how awesome this is.

I am sitting here right now wondering if I should invest more of my savings directly in Apple stock, at least temporarily to ride their sales wave, or if I should buy a Mac mini and a nice wide curved monitor with a mechanical keyboard from WASD and be f'ing awesome all of a sudden.

The only reason I'm not hitting the buy button is that all of that isn't $135. There's no reason for that amount, but if it said $135, I'd have already paid for it and been drinking beer to celebrate the happiest purchases I ever made.


The new Mini is certainly impressive and suitable for many tasks, but not a replacement for a full desktop machine. Memory and storage are very limited, and the GPU, while great for the MacBook Air, is far from desktop performance. Also, for desktop, the ports selection is very limited.


Given that they just bumped the imac in august we might be waiting for a bit before we start to see ARM imacs.


The iMac in many senses isn't a replacement for a proper desktop. You can't expand the disk storage, many iterations had no great graphics cards, this seems to be somewhat better now, but an upgrade means having to upgrade everything, including the screen. You can't even clean out the fans after some years.

Yes, I own an iMac, as this is the closest to a desktop machine Apple sells, but a replacement for what the Mac Pro used to be, it is not.


3970x 32-core is < $2000

I got curious...

  - amd 3990x 64-core 128 thread    $3849
  - 256gb g.skill ddr 3600          $978
  - noctua nh-u14s cooler           $80
  - samsung 980 pro 1tb nvme pcie4  $229
  - evga 1000w power supply         $207
  - fractal design define 7         $170
  - asus trx40-pro mb               $396
  - nvidia rtx 3080                 $699 (?)
  - steve jobs: a biography         $15  (hardcover)

  = a really maxed out system       $6623
probably 1/2 that for a 5950x/am4 system

EDIT: ok, I had to know...

  - amd 5950x 16 core 32 thread     $799
  - 128gb g.skill ddr 3600          $489
  - noctua nh-u12s cooler           $60
  - samsung 980 pro 1tb nvme pcie4  $229
  - evga 850w power supply          $139
  - fractal design define 7 compact $130
  - asus x570-pro mb                $240
  - nvidia rtx 3070                 $499 (?)
  - steve jobs: a biography         $14  (kindle)
  - linux with kde mac-look icons   $0

  = a really really great system    $2599


Looks pretty nice, but for many you'd be better off with: - AMD 5600x or 5700x (saving $500 or $400) - Samsung 970 pro 2TB for $229 (twice the space for the same price) - rtx 3060 (in a few weeks)

You'll save a fair bit of $600, run quite a bit cooler, it will be much easier to be quieter, and have twice the disk space. Or buy 2x2TB NVMe (motherboards with 2x m.2 are common these days).

Sure the 5600x/5700x isn't as fast in throughput, but how often do you max more then 6/8 cores? Per core performance is near identical and with more memory bandwidth per core you run into less bottlenecks.

I bet over a few years more people would notice double the disk than the missing extra cores.


I don't use apple products but I think the high price of a workstation is not something specific to Apple.

From a discussion I had with a friend recently, I found that Precision workstations from Dell or Z workstations from HP have similar prices for similar performances (sometime prices can reach 40k or 70k dollars).

When comparing Mac Pro to an enthusiast pc build, yes the mac pro is "overpriced", but the mac pro is using a Xeon which is pricier than a ryzen (even if performance wise it's inferior) and a pro gpu which also cost more than consumer gpu (again, even if performance is inferior). The price of a nvidia quadro is always higher than a Geforce gpu with the same specs.

You can see a spec/price comparison I did when having the discussion with my friend here : https://mega.nz/file/Nj4UnSJR#fBdZfn3zoZ8boxap35-GWEgDlicH3R...


It's not just "enthusiast PC builds". You can buy plenty of PCs in that middle category of high-but-not-extreme performance without the certifications, Quadros and vendor markup that a full-on "workstation" model has. And they're perfectly fine choices for many professional use cases.

For that market, the Mac Pro is overpriced (the high-end Dell/HP workstations are too), and Apple doesn't make anything more suited for it. That's the criticism. That the Mac Pro is acceptably priced compared to the Dell/HP workstations doesn't matter if that's not what you need.


My bad, I misunderstood the context of the comment. Thanks for the clarification.

For me personally, outside of laptops and phones, I don't see the appeal of using Apple hardware (unless you want to use MacOS X).


> I think the high price of a workstation is not something specific to Apple.

A professional workstation, with support, services, guaranteed replacement components, guaranteed service-life, maintenance contracts and so on is very different from an enthusiast-built machine.

It's like comparing a BMW M3 to a tricked out VW Golf. You can fit a bigger, badder engine under the VW's hood, stiffen the suspension, replace the gearbox and so on but, in the end, you can get one straight from the dealer and not everyone is inclined to assemble a car from parts.

Did that once. It's fun, educational and not very practical.


Well, VW does sells tricked out VW Golf, straight from the factory (GTI, R) :) Though it is more competition to M135i than M3.


Exactly. You can make a VM Golf that performs like an M3, but, in the end, it'll be a lot of work for an unreliable car.


And the M3 is reliable? BMWs are famous for being leased since repair costs will kill you once your warranty is up.


Depends on the model. The E-series 3-ers were very reliable; for the F-series and newer it is exactly as you wrote.

The leasing thing is for slightly different reason: it is being used by such a market segment, that always wants something new. They would not drive older car, even if it was reliable, it would be not cool enough. Unfortunately, since cca 2010 BMW also found it out, and since then their cars stopped being good -- they don't have to last -- and are just expensive.


Yes and no. 15k hardware is for people who are using it professionally. By "professionally" I mean that they can throw such purchase into their cost and just pay less tax. From my perspective it does not matter if I pay ~15k to tax office or to Apple.

In Europe the incentives to buy expensive goods as a company (like cars, fancy office furniture, etc.) is even bigger because of VAT tax (much bigger than US sales tax).


That just isn't how taxes work. If you spend 15k of your profits and buy capital goods you don't reduce your taxes by 15k because no one was going to charge you 100% tax on it. You save only the foregone taxes rate on the purchase.


And depreciation.


Of you can put your heavy workloads on the cloud and pass it as opex.


which is the way to go


It depends. If it's a constant workload, it may be better to lease the server and operate it on-prem. If it's spiky, then cloud and on-demand/spot is the best option.


You can get the VAT off, sure, but on the rest you just get to pay out of pre-tax profit. In the UK that means 20%.

So a machine that's £5k retail becomes £4167 without VAT, effectively £3333 if you take into account tax savings, which only apply if your company is in profit. A £15K machine effectively would still cost you £10k.

It's a big saving, sure, but it's still a very expensive machine.


Some people pay more than 20% tax. At 50% tax the savings look pretty good.


That tax rate is corporation tax, not a personal tax. Does any European country have a 50% corporation tax?

You're right if you start looking at "Well I run my own company so the cost compared to paying myself that cash as a dividend is much smaller", but that only really applies to those of us who do run our own small companies, own them fully and run them profitably, and have already pumped their personal earnings up to that level. And then we're on to a question about what that box is for and why it's needed, is it a company asset or a personal one?

And remember that you get to apply the same percentage discount to any other machine - your 15K apple box may come down to a conceptual £5K hit on your pocket, if you're paying 50% personal tax on top of the company taxes, but a £4-5k Zen 3 box with dual nvidia 3090s in it will come in at £1333-£1600 by the same metric and quite likely perform better...


If you can run your business on a Zen box with an Nvidia 3090 in it, good for you!


I mean, if you're not running macos-specific stuff, then a top-end Zen3 box with a couple of 3090s in it is going to have more grunt than a 15k mac pro with a Xeon and a Vega II Duo.

But I wasn't really here to talk about comparative value anyway - this was a tax discussion!


Not sure what needs to be discussed there, I know how much tax I pay and which tools I need to get my job done.


Back at the top you seemed to be saying that the entire spend would come off tax, that's I think why people picked you up.


I think you are talking to the wrong poster ;-)


Sure it's not the first time!


I'm not sure if a VAT (Value Added Tax) deduction is gonna save anyone's Christmas considering the base Mac Pro is 5320 EUR (6 333 USD) tax-free.


Doesn't this mean that you cannot use the computer for personal use, or that you can only reclaim the business-use proportion of that VAT?


Strictly speaking, yes, there is the expectation that the machine is used entirely for business purposes, if the business is paying for it, otherwise it might be considered a benefit in kind. It's not so much about VAT then as PAYE.

However I feel no particular guilt that the workstation I use for my full-time dev day-job also has a windows partition for gaming in the evening, and I hope that the tax authorities would see things the same way! It's not like the asset isn't a justified business purchase.


Obviously “I’m willing to pay a lot” can mean a wide range of things, but pretty clearly the comment is talking about paying a moderate premium over the competition. The same way a MacBook Pro model might cost $2500 where you could get a similarly specked windows laptop for $1800. Or an iPhone might cost 30% more than a similar android flagship.

It’s an order of magnitude different with the Mac Pro, the base model is a $6000 machine that will perform like a ~$1500 PC. And the base model makes no sense to buy, it’s really a $10k-$30k machine. It’s a completely different product category.


I don't mind paying for a new Apple, but I do mind paying for repairs on a system that fails just after the warranty runs out. Paying $1500 for a new computer every 5 years or so is good. Paying $1500 for a computer every 16 months, not so much.

The monitor on my MacBook Pro just died, and I bought it July of last year. The repair was about $850 USD. Luckily my credit card covered the hardware warranty, but I'm kind of wishing I'd bought AppleCare.


AppleCare for the Macs is definitely worth it. They even replaced the display on my MBP after 5 years when I called (that seems to be the key is to call someone at Apple directly). This was a part they should've recalled, but anyway I wouldn't have gotten a new display without AppleCare.


There was a display replacement program for 2012-2015 MBP13s, due to the staingate. Though I recall that it was for 4 years since date of purchase, or so.


Yep it was related to that, even though AppleCare ended up replacing my screen after 6 years (not 5 like I initially remembered). It was an unexpected bonus and I’m still quite happy about it. But the key was to call AppleCare directly and not go to the geniuses.


I wonder if there will be one for 2019 MacBook Pros. The repair shop said the replacement display was also faulty, so they had to order a replacement replacement.


Is this the connector problem where the screen doesn’t turn on?


Unless absolutely everything in the PC fails at once, you can just repair it and change components as needed. Not really doable with a Mac given that you can't buy the components from Apple.


Because even a $3K non-Apple Machine outperforms the $5K base model Mac Pro by a large margin, and if I spent $5K outside Apple it would be even more ridiculous, double RTX 3090 + 24 core Threadripper ridiculous.


This is fascinating:

> Retain and release are tiny actions that almost all software, on all Apple platforms, does all the time. ….. The Apple Silicon system architecture is designed to make these operations as fast as possible. It’s not so much that Intel’s x86 architecture is a bad fit for Apple’s software frameworks, as that Apple Silicon is designed to be a bespoke fit for it …. retaining and releasing NSObjects is so common on MacOS (and iOS), that making it 5 times faster on Apple Silicon than on Intel has profound implications on everything from performance to battery life.

> Broadly speaking, this is a significant reason why M1 Macs are more efficient with less RAM than Intel Macs. This, in a nutshell, helps explain why iPhones run rings around even flagship Android phones, even though iPhones have significantly less RAM. iOS software uses reference counting for memory management, running on silicon optimized to make reference counting as efficient as possible; Android software uses garbage collection for memory management, a technique that requires more RAM to achieve equivalent performance.


This quote doesn’t really cover why M1 macs are more efficient with less ram than intel macs? You’ve got a memory budget, it’s likely broadly the same on both platforms, the speed at which your retains/releases happen isn’t going to be the issue. it’s not like intel macs use GC where m1 uses RC.

(It explains why iOS does better with less ram than android, but the quote is specifically claiming this as a reason for 8GB ram to be acceptable)


I doubt the M1 macs are really using memory much more efficiently; the stock M1 macs with 8GB were available rapidly; the Macs with 16GB ram or larger disk space had a three to four week delay when ordering; a lot of enthusiasts and influencers rushed out and got base models; they are then surprised to find they can work ok in most apps with only 8GB.

Perhaps they never really needed to fit 32GB into their intel macs either. Some days after the glowing reviews; and strange comments about magic memory utilization; we now see comments concerned about SSD wear due to swap file usage.

If the applications and data structures are more compact in memory on the arm processors; it should be easy to test; you just need an intel mac; and an M1 mac running the same app on the same document and look at how much memory it uses.


When you need 32 or 64gb of ram is not because the data structures or the programs you use need memory, it's because the data they use (database content, virtual machines, images, videos, music... ) fill that ram, and those data is not going to occupy less on an arm machine.


However real case usage of such massive amount of data are limited for typical desktop user. Massive database load usually happen on specialized servers in which 256Go RAM and more are pretty mundane.

So on customer PC ram is maybe more used as caching mechanisms or eaten away by poorly designed memory leak/garbage collection.

And if your GPU is able do to real time rendering on data heavy load maybe you need less caching of intermediate results as well.


Plenty of use cases for more than 8gb of ram. When you're doing data analysis on even smaller datasets you may need several times more available memory than the size of the dataset as you're processing it.


1.Again typical use case for entry level PC is not data analysis on bigdata.

2.My current production server is a PostgreSQL database on a 16GB RAM VM running on Debian (my boss is stingy). This doesn't prevent me from managing a 300GB+ data cluster with pretty decent performances and perform actual data analysis.

3.If Chrome sometimes use +8GB for a godsake webrowser the only explanation is poor design, there is no excuse.


"Nobody needs more than 640k of RAM"


that was not my point, my point was that if you need it, you will need it regardless of the architecture.


I think you’re right. I’ve only ever needed 32 gb when I was running a local hadoop cluster for development. Those virtual images required the same amount of ram regardless of OS.


It's a contributing factor. If things like retain/release are fast and you have significantly more memory bandwidth and low latency to throw at the problem, you can get away without preloading and caching nearly as much. Take something simple like images on web pages: don't bother keeping hundreds (thousands?) of decompressed images in memory for all of the various open tabs. You can just decompress them on the fly as needed when a tab becomes active and then release them when it goes inactive and/or when the browser/system determines it needs to free up some memory.


You've completely changed the scope of what's being discussed, though. Retain/release being faster would just surface as regular performance improvements. It won't change anything at all about how an existing application manages memory.

It's possible that apps have been completely overhauled for a baseline M1 experience. Extremely, extraordinarily unlikely that anything remotely of the sort has happened, though. And since M1-equipped Macs don't have any faster IO than what they replaced (disk, network, and RAM speeds are all more or less the same), there wouldn't be any reason for apps to have done anything substantially difference.


From the original article:

Third, Marcel Weiher explains Apple’s obsession about keeping memory consumption under control from his time at Apple as well as the benefits of reference counting:

>where Apple might have been “focused” on performance for the last 15 years or so, they have been completely anal about memory consumption. When I was there, we were fixing 32 byte memory leaks. Leaks that happened once. So not an ongoing consumption of 32 bytes again and again, but a one-time leak of 32 bytes.

>The benefit of sticking to RC is much-reduced memory consumption. It turns out that for a tracing GC to achieve performance comparable with manual allocation, it needs several times the memory (different studies find different overheads, but at least 4x is a conservative lower bound). While I haven’t seen a study comparing RC, my personal experience is that the overhead is much lower, much more predictable, and can usually be driven down with little additional effort if needed.


But again that didn't change with M1. We're talking MacOS vs. MacOS here. Your quote is fully irrelevant to what's being discussed which is the outgoing 32gb macbook vs the new 16gb-max ones. They are running the same software. Using the same ObjC & Swift reference counting systems.


We've run full circle there

ARC is not specific to M1, BUT have been widely used in ObjC & Swift for years AND is thus heavily optimized on M1 that perform "retain and release" way faster (even when emulating x86)

Perfect illustration of Apple software+hardware long term strategy.


That still doesn't mean that M1 Macs use less memory. If retain/release is faster then the M1 Macs have higher performance than Intel Macs. That is easily understood. The claim under contention here is that M1 Macs use less memory, which is not explained by hardware optimized atomic operations


And I never stated that. It's just more optimized.


Ok. However the posts in this thread were asking how the M1 Macs could use less RAM than Intel Macs, not if they were more optimized. The GP started with:

>This quote doesn’t really cover why M1 macs are more efficient with less ram than intel macs? You’ve got a memory budget, it’s likely broadly the same on both platforms


Well, if less memory is used to store garbage thanks to RC, less memory is needed. But that was largely discussed in other sub-comments hence why we focused more on the optimisation aspect in this thread.


>Well, if less memory is used to store garbage thanks to RC, less memory is needed

But both Intel Macs and ARM Macs use RC. Both chips are running the same software.


Aren't most big desktop apps like office on PC still written in C++? Same with almost all AAA games. And the operating system itself.

Browsers are written in C++ and javascript has full-blown GC.

I don't see how refcounting gives you advantage over manual memory management for most users.


Decompression is generally bound by CPU speed, not memory bandwidth or latency.


CPU speed is often bound by memory bandwidth and latency... it's all related. If you can't keep the CPU fed, it doesn't matter how fast it is theoretically.


What I mean is that (to my understanding) memory bandwidth in modern devices is already high enough to keep a CPU fed during decompression. Bandwidth isn't a bottleneck in this scenario, so raising it doesn't make decompression any faster.


RAM bandwidth limitations (latency and throughput) are generally hidden by the multiple layers of cache in between the ram and CPU prefetching more data than is generally needed. Having memory on chip could make the latency less, but as ATI has shown with HBM memory on a previous generation of its GPUs its not a silver bullet solution.

I am going to speculate now, but maybe, just maybe, if some of the silicon that apple has used on the M1 is used for compression/decompression they could be transparently compressing all ram in hardware. Since this offloaded from the CPUs and allows a compressed stream of data from memory, they achieve greater ram bandwidth, less latency and less usage for a given amount of memory. If this is the case I hope that the memory has ECC and/or the compression has parity checking....


> I am going to speculate now, but maybe, just maybe, if some of the silicon that apple has used on the M1 is used for compression/decompression they could be transparently compressing all ram in hardware. Since this offloaded from the CPUs and allows a compressed stream of data from memory, they achieve greater ram bandwidth, less latency and less usage for a given amount of memory.

Are you aware of any x86 chips that utilize this method?


Not that I am aware. I remember seeing apple doing something it in software with the intel macs. Which is why I speculated about it being hardware for M1.

Cheers


> Blosc [...] has been designed to transmit data to the processor cache faster than the traditional, non-compressed, direct memory fetch approach via a memcpy() OS call. Blosc is the first compressor (that I'm aware of) that is meant not only to reduce the size of large datasets on-disk or in-memory, but also to accelerate memory-bound computations (which is typical in vector-vector operations).

https://blosc.org/pages/blosc-in-depth/


I can't speak to the MacOS system, but from years spent JVM tuning: you're in a constant battle finding the right balance of object creation/destruction (the former burning CPU, the latter creating garbage), keeping memory use down (more collection, which burns CPU and can create pauses and hence latency), or letting memory balloon (which can eat resource, and makes the memory sweeps worse when they finally happen).

Making it cheaper to create and destroy objects with hardware acceleration, and to do many small, low-cost reclaims without eating all your CPU would be a magical improvement to the JVM, because you could constrain memory use without blowing out CPU. From what's described in TFA it sounds like the same is true for modern MacOS programming.


Manual memory management isn't magic and speeding up atomic ops doesn't fundamentally change anything. People have to spend time tuning memory management in C++ too, that's why the STL has so many ways to customise allocators and why so many production C/C++ codebases roll custom management schemes instead of using malloc/free. They're just expensive and slow so manual arena destruction etc is often worth it.

The JVM already makes it extremely cheap to create and destroy objects: creation is always ~free (just a pointer increment), and then destruction is copying, so very sensitive to memory bandwidth but done in parallel. If most of your objects are dying young then deallocation is "free" (amortized over the cost of the remaining live objects). Given the reported bandwidth claims for the M1 if they ever make a server version of this puppy I'd expect to see way higher GC throughput on it too (maybe such a thing can be seen even on the 16GB laptop version).

The problem with Java on the desktop is twofold:

1. Versions that are mostly used don't give memory back to the OS even if it's been freed by the collector. That doesn't start happening by default until like Java 14 or 15 or so, I think. So your memory usage always looks horribly inflated.

2. If you start swapping it's death because the GC needs to crawl all over the heap.

There are comments here saying the M1 systems rely more heavily on swap than a conventional system would. In that case ARC is probably going to help. At least unless you use a modern pauseless GC where relocation is also done in parallel. Then pausing background threads whilst they swap things in doesn't really matter, as long as the app's current working set isn't swapped out to compensate.


Yea, this is a BS theory. I have a 16Gb M1 MacBook Air and the real answer is that it has super fast SSD access, so you don’t notice the first few gigabytes of swap.

But when swap hits 8-9 Gb, it’s effects start to get very noticeable.


This seems correct. RC vs GC might explain how a Mac full of NSObjects needs less memory than a Windows full of .NET runtimes, but it doesn’t explain how M1 Mac with 16GB of RAM is faster than x86 Mac with 16GB or more of RAM.

Besides, a lot of memory usage is in web browsers, which must use garbage collection.

Looking at the reviews of M1 Macs, those systems are still responsive and making forward progress at a “memory pressure” that would make my x86 Mac struggle in a swap storm. It seems to come down to very fast access to RAM and storage, large on-die caches, and perhaps faster memory compression.


Oh one more thing, they said in the Apple Silicon event that they had eliminated a lot of the need for copying RAM around so … could be some actual footprint reduction there?


I tend to agree! I think Big Sur on M1 uses 16kB page size vs 4kB on Intel so maybe that contributes to more efficient / less obvious perf issue when swapping.


yeah its a bit of a stretch. to the extent that macos apps use garbage collection less than pc apps it would need less ram. but they are kinda hopping around a macos vs android comparison which makes no sense. I think mac enthusiasts trying to imagine why a max of 8 or 16gb is ok. it is ok for most people anyway.


It also would have no difference between the outgoing Intel ones and the incoming Apple Silicon ones. Same pointer sizes, same app memory management, etc... Some fairly minor differences in overall binary sizes, so no "wins" there or anything either.


All Swift/ObjC software has been doing ARC for ten (?) years. Virtual memory usage will be the same under M1. It will just pay off in being faster to refcount (ie as fast as it already is on an iPhone), and therefore the same software runs faster. Probably won't work under Rosetta 2 with the per-thread Total Store Ordering switch. And it's probably not specific to NSObject, any thread safe reference counter will benefit. There are more of those everywhere these days.

2 more points:

- All the evidence I've seen is gifs of people opening applications in the dock, which is... not impressive. I can do that already, apps barely allocate at all when they open to "log in to iCloud" or "Safari new tab". And don't we see that literally every time Apple launches Mac hardware? Sorry all tech reviewers everywhere, try measuring something.

- I think the actual wins come from the zillion things Apple has done in software. Like: memory compression, which come to think of it might be possible to do in hardware. Supposedly a lot of other work/tuning done on the dynamic pager, which is maybe enabled by higher bandwidth more than anything else.

Fun fact: you can stress test your pager and swap with `sudo memory_pressure`. Try `-l critical`. I'd like to see a benchmark comparing THAT under similar conditions with the previous generation.


You might try reading the article. One example was a large software build taking ~25% less time on a 13" MBP than on a 12-core Mac Pro.

I'm curious about FP/vector performance, but I'm pretty sure it's fine. I'm definitely eyeing a MBP myself! 20 hours of video playback? Crazy...


Right, which wasn't a test of the purportedly increased ability to work in high memory pressure at all.


> macos vs android comparison which makes no sense.

Because the M1 is similar to the chips used in iOS, hence the comparison is not inappropriate.


all comparisons are appropriate, but the question here was whether the mac laptops memory limits were somehow made better by more efficient use of memory. they are not. these are laptops, not phones or tablets. memory is used as efficiently as in previous laptops.


The quote right after explains your concerns.

>The memory bandwidth on the new Macs is impressive. Benchmarks peg it at around 60GB/sec–about 3x faster than a 16” MBP. Since the M1 CPU only has 16GB of RAM, it can replace the entire contents of RAM 4 times every second. Think about that…


Yes, reading some more of the discussions it seems like the answer is that (roughly) the same amount of memory is used, but hitting swap is no longer a major problem, at least for user-facing apps. Seems like the original quote is reading too much into the retain/release thing.


If the same amount of memory is used, then wouldn't swap usage be the same?


Yeah, but swapping suddenly isn't as big problem as before (probably). You had to be very careful not to hit trashing on x86_64, now you don't have to worry so much.

Or that's how I understand this, I don't actually own M1 Mac.


first there is no sensible reason why ram bandwidth would be different by 3x, its lpddr4x either way, and you can’t replace it from an ssd that fast, the ssd would limit swap speed


What?

https://www.anandtech.com/show/16252/mac-mini-apple-m1-teste...

> Besides the additional cores on the part of the CPUs and GPU, one main performance factor of the M1 that differs from the A14 is the fact that’s it’s running on a 128-bit memory bus rather than the mobile 64-bit bus. Across 8x 16-bit memory channels and at LPDDR4X-4266-class memory, this means the M1 hits a peak of 68.25GB/s memory bandwidth.

The point of the memory bandwidth is so that it never has to swap to disk in the first place.


> The point of the memory bandwidth is so that it never has to swap to disk in the first place.

What? How does memory bandwidth obviate the need for disk swapping?


How does memory bandwidth prevent it needing to swap to disk?


By swap speed I think he meant that the bottle neck is the time that it takes to move data from the SSD to the RAM, not how fast can the RAM be read from the processor.


Sounds like some form of GDDR instead of plain DDR. Not only faster, but I bet simultaneously accessible from both the CPU and GPU.


> Not only faster, but I bet simultaneously accessible from both the CPU and GPU.

It is. We know they're using a unified memory architecture, they pointed it out in the presentation.


Unified doesn't necessarily mean dual ported.


Due to how DRAM works the array itself cannot have more than one port and almost certainly even the DRAM chip as a whole is still some variation of single ported SDRAM (long time ago there were various pseudo-dual-ported DRAM chips, but these were only really useful for framebuffer-like applications). But given that there are multiple levels of cache in the SoC it is somewhat moot point.


LPDDR4x (the ram chips in the m1) comes in dual port flavors.


I suspect you mean they come with two channels on a single chip, which is not the same as two ports. Channels access separate bits of memory. Ports access the same bits of memory.


Nope. Nothing too magical, I suspect. It says LPDDR4 right there in System Profiler, although it does not indicate the frequency.


This reminds me that Xbox series X/PS5 use Unified GDDR for higher memory bandwidth, I'm curious if such design can help x86 catch up M1?


x86 has been doing unified memory for integrated GPUs since at least 2015. It's not a new thing, see for example https://bit-tech.net/news/tech/cpus/amd-huma-heterogeneous-u...

The reason GDDR isn't typically used for system RAM is it's higher latency & more power hungry. Like, the GDDR6 memory on a typical discreet card uses more power than the an entire M1-powered Mac Mini power hungry.


Thanks, I haven't noticed the latency/power difference of GDDR memory.


no its normal mobile ram


The M1 has 4 LPDDR4x chips inside the the CPU package running at 4266 Mhz and showing some of the best latencies I've seen.

What "normal" laptop has that?


I believe the poster meant "normal" in the sense that it's a conventional memory technology for laptops. (ie LPDDR4 not GDDR like had been suggested above).


being inside the cpu package should allow for less than a 1% improvement in latency by my napkin math. its got good latency because it is a top of the line mobile ram setup but that isn’t unique to m1


I 100% agree, but I've audited my code, and on other platforms my code closely agrees with LMbench's lat_mem_rd, which seems pretty well regarded for accuracy.

The latency appears real to me.


Someone please correct me for the sake of all of us if I’m wrong, but it sounds like Apple is using specialized hardware for “NSObject” retain-and-release operations, which may bypass/reduce the impact on general RAM.


On recent Apple Silicon CPUs uncontended most atomic operations are essentially free - almost identical in speed to the non-atomic version of the same operation. Reference counting must be atomic safe whether using ARC or MRR. On x86 systems those atomic operations impose a performance cost. On Apple Silicon they do not. It does not change how much memory is used but it does mean you can stop worrying about the cost of atomic operations. It has nothing to do with the ARMv8 instruction set, it has to do with how the underlying hardware implements those operations and coordinates among cores.

Separately from that x86's TSO-ish memory model also imposes a performance cost whether your algorithm needs those guarantees or not. Code sometimes relies on those guarantees without knowing it. Absent hardware support you would need to insert ARM atomics in translated code to preserve those guarantees which on most ARM CPUs would impose a lot of overhead. The M1 allows Rosetta to put the CPU into a memory ordering mode that preserves the expected memory model very efficiently (as well as using 4K page size for translated processes).


> On recent Apple Silicon CPUs uncontended most atomic operations are essentially free - almost identical in speed to the non-atomic version of the same operation.

They are fast for atomics but still far, far slower than the equivalent non-atomic operation. An add operation takes around half a cycle (upper bound here - with how wide the firestorm core is an add operation is almost certainly less than half a cycle). At 1ghz a cycle is 1 nanosecond. The M1 runs at around 3ghz. So you're still talking the atomic operation being >10x slower than non-atomics.

Which should not be surprising at all. Apple didn't somehow invent literal magic here. They still need coherency across 8 cores, which means at a minimum L1 is bypassed for the atomic operation. The L2 latency is very impressive, contributing substantially to that atomic operation performance. But it's still coming at a very significant cost. It's very, very far from free. There's also no ARM vs. x86 difference here, since the atomic necessarily forces a specific memory ordering guarantee that's stricter than x86's default. Both ISAs are forced to do the same thing and pay the same costs.


> So you're still talking the atomic operation being >10x slower than non-atomics.

How did you arrive at this number?


> How did you arrive at this number?

It's in the post. Half a cycle for an add or less, and cycles are every 1/3 nanosecond. So upper bound for an add would be around 1/6th a nanosecond. Likely less than that still yet, since the M1 is probably closer to an add in 1/8th a cycle not 1/2. Skylake by comparison is at around 1/4th a cycle for an add, and since M1's IPC is higher it's not going to be worse at basic ALU ops.

6 nanoseconds @ 3ghz is 18 cycles. That's on the slow end of the spectrum for a CPU instruction.


Where? 6 nanoseconds is pretty long, that’s about how long it’d take to do an entire retain/release pair, which is a couple dozen instructions I believe.


I don't think that's quite right. Apple believes strongly in retain-and-release / ARC. It has designed its software that way; it has designed its M1 memory architecture that way. The harmony between those design considerations leads to efficiency: the software does things in the best way possible, given the memory architecture.

I'm not an EE expert and I haven't torn apart an M1, but Occams's Razor would suggest it's unlikely they made specialized hardware for NSObjects specifically. Other ARC systems on the same hardware would likely see similar benefits.


I suspect that Apple didn't anything special to improve performance of reference counting apart from not using x86. Simply put x86 ISA and memory model is built on assumption that atomic operations are mostly used as part of some kind of higher-level synchronization primitive and not for their direct result.


M1 is faster at retain/release under Rosetta2 than x86, and yet Rosetta2 still has the same strong memory model that x86 does.


One thing is that M1 has incredible memory BW and is implemented on single piece of silicon (which certainly helps with low-overhead cache consistency). Another thing is that rosetta certainly does not have to preserve exact behavior of x86 (and in fact it cannot because doing so will negate any benefits of dynamic translation) it only has to care about what can be observed by the user code running under it.


The hardware makes uncontended atomics very fast, and Objective-C is a heavy user of those. But it would really help any application that could use them, too.


But GCd languages don't need to hit atomic ops constantly in the way ref-counted Objective C does, so making them faster (though still not as fast as regular non-atomic ops) is only reducing the perf bleeding from the decision to use RC in the first place. Indeed GC is generally your best choice for anything where performance matters a lot and RAM isn't super tight, like on servers.

Kotlin/Native lets us do this comparison somewhat directly. The current and initial versions used reference counting for memory management. K/N binaries were far, far slower than the equivalent Kotlin programs running on the JVM and the developer had to deal with the hassle of RC (e.g. manually breaking cycles). They're now switching to GC.

The notion that GC is less memory efficient than RC is also a canard. In both schemes your objects have a mark word of overhead. What does happen though, is GC lets you delay the work to deallocate from memory until you really need it. A lot of people find this quite confusing. They run an app on a machine with plenty of free RAM, and observe that it uses way more memory than it "should" be using. So they assume the language or runtime is really inefficient, when in reality what's happened is that the runtime either didn't collect at all, or it collected but didn't bother giving the RAM back to the OS on the assumption it's going to need it again soon and hey, the OS doesn't seem to be under memory pressure.

These days on the JVM you can fix that by using the latest versions. The runtime will collect and release when the app is idle.


I saw that point brought up on Twitter and I don't know how it it makes more efficient use of RAM.

Specifically, as I understood it is that Apple software (written in objective C/Swift) uses a lot of retain/release (or Atomic Reference Counting) on top of manual memory, for memory management rather than other forms of garbage collection (such as those found in Java/C#), which gives Objective C programs a lower memory overhead (supposedly). This is why the iPhone ecosystem is able to run so much more snappier than the Android ecosystem.

That said, I don't see how that translates to lower memory usage than x86 programs. I think the supporting quotes he used for that point are completely orthogonal. I don't have an M1 mac, but I believe the same program running on both machines should use the same amount of memory.


The only thing I can think of that would actually reduce memory usage on M1 vs the same version of MacOS on x86 would be if they were able to tune their compressed memory feature to run faster (with higher compression ratio) on the M1. That would serve to reduce effective memory usage or need to fall back to swap. I would not expect something like that to be responsible for more than, say, a 5-10% RAM usage decrease though.


That would serve to reduce effective memory usage or need to fall back to swap. I would not expect something like that to be responsible for more than, say, a 5-10% RAM usage decrease though.

I think you can reach a lot more than that. Presumably, on Intel they use something like LZO or LZ4, since it compresses/decompresses without too much CPU overhead. But if you have dedicated hardware for something like e.g. Brotli or zstd, one could reach much higher compression ratios.

Of course, this is assuming that memory can be compressed well, but I think this is true in many cases. E.g. when selecting one of the program/library files in the squash benchmarks:

https://quixdb.github.io/squash-benchmark/

you can observe higher compression ratios for e.g. Brotli/gzip/deflate than LZO/LZ4.


I suspect Apple is using their own LZFSE[0] compression, perhaps now with special tweaks for M1. The reason I only suspected a 5-10% increase though is even if it’s able to achieve a massive increase in compression ratio (compressing 3GB to 1GB instead of 2GB, say), that’s still only saving 1GB total. Which I guess isn’t nothing and is more than 10% on an 8GB machine.

[0] https://en.m.wikipedia.org/wiki/LZFSE


For a very long time, budget Android devices that were one or even two generations older were faster than just released iPhones at launching new apps to interactivity (see: hundreds of Youtube "speed comparison" videos). This was purely due to better software, as the iPhone had significantly faster processors and I/O. RAM doesn't play a big factor at launch. One very minor contributor would be that the GC doesn't need to kick in until later, while ARC is adding its overhead all the time.

Edit: apparently, this isn't common knowledge.

https://www.youtube.com/watch?v=emPiTZHdP88

https://youtu.be/hPhkPXVxISY

https://youtu.be/B5ZT9z9Bt4M


Ridiculous that you're downvoted. There are a lot of people posting here who haven't worked on memory management subsystems.

GC vs RC is not a trivial comparison to make, but overall there are good reasons new systems hardly use RC (Objective-C dating back to the 90s isn't new). Where RC can help is where you have a massive performance cliff on page access, i.e. if you're swapped to disk. Then GC is terrible because it'll try and page huge sections of the heap at once where as RC is way more minimal in what it touches.

But in most other scenarios GC will win a straight up fight with an RC based system, especially when multi-threading gets involved. RC programs just spend huge amounts of time atomically incrementing and decrementing things, and rummaging through the heap structures, whereas the GC app is flying along in the L1 cache and allocations are just incrementing a pointer in a register. The work of cleaning up is meanwhile punted to those spare cores you probably aren't using anyway (on desktop/mobile). It's tough to beat that by hand with RC, again, unless you start hitting swap.

If M1 is faster at memory ops than x86 it's because they massively increased memory bandwidth. In fact I'd go as far as saying the CPU design is probably not responsible for most of the performance increase users are seeing. Memory bandwidth is the bottleneck for a lot of desktop tasks. If M1 core is say 10% faster than x86 but you have more of them and memory bandwidth is really 3x-4x larger, and the core can keep far more memory ops in flight simultaneously, that'll explain the difference all by itself.


Indeed, one of the articles cited by the article that this discussion is about (https://blog.metaobject.com/2020/11/m1-memory-and-performanc...) links to a paper saying that GC needs 4x the memory to match manual memory management and then makes a huge wacky leap to say that ARC could achieve that with less. It can't. ARC will always be slower than manual memory management because it behaves the same way as naive manual memory management with some overhead on top.

On the other hand, that same paper shows that for every single one of their tested workloads, the generational GC outperforms manual memory management. Now obviously, you could do better with manual memory management if you took the time to understand the memory usage of your application to reduce fragmentation and to free entire arenas at a time, but for applications that don't have the developer resources to apply to that (the vast majority), the GC will win.

I'm not saying that better memory management is the reason Android wins these launch to interactivity benchmarks because the difference is so stark relative to the hardware performance that memory management isn't nearly enough to explain it, but it does contribute to it. (My own guess is that most of the performance difference comes from smarter process initialization from usage data. Apple is notoriously bad at using data for optimization.)


ARC stands for Automatic Reference Counting.

https://en.wikipedia.org/wiki/Automatic_Reference_Counting


Tfa says the hardware optimization for ARC is to the point of being bespoken. Hardware will always beat software optimization. Further, the other GCs have much higher ram overheads than this combined, bespoke system.

Apple has decades of proven experience producing and shipping massively over engineered systems. I believe em when they say these processors do ARC natively.


I’m not denying that Apple has better ARC performance. It’s that I don’t understand how an application would use less memory on ARM than x86. I’d expect the ARM code to run faster (as a result of being able to do atomic operations faster), but I don’t see how that translates to less memory usage


In one of those random “benchmark tests” online where someone opened several applications on an M1 Mac with 8GB RAM and did some work on those, they kept Activity Monitor open alongside and pointed to the increase in swap at some stage. So it seems like the swap is fast enough and is used more aggressively. That reduces the amount of RAM used at any point in time. The data in RAM has also benefited from compression in macOS for several years now.


Read up on the performance overhead of GC across other languages. They’re messy and can lock up periodically. They take up significant ram and resources.


We're talking about non-GC apps on x86 vs. the same non-GC apps on Arm.


Even those still do memory management-and usually poorly.


it makes objective c and swift memory management faster but it doesn’t reduce ram usage at all. (maybe a weee bit less bandwidth used)


If memory is released as soon as possible instead of waiting for the next GC cycle, does not it make it more efficient?


Yes!

Hopefully I can clear up the discussion a little:

Q: Does reference counting 'use' less RAM than GC?

A: Yes (caveats etc. go here, but your question is a good explanation)

Q: Does the M1 in and of itself require less RAM than x86 processors?

A: No

Q: So why are people talking about the M1 and its RAM usage as if it's better than with x86?

A: It's really just around the faster reference counting. MacOS was already pretty efficient with RAM.

I'd like to propose tokamak-teapot's formula for hardware purchase:

Minimum RAM believed to be required = actual amount of RAM required * 2

N.B. I am aware that a sum that's greater than 16GB doesn't magically become less than 16GB, but it is somewhat surprising how well MacOS performs when it feels like RAM should be tight, so I'd suggest borrowing a Mac or making a Hackintosh to experience this if you're anxious about hitting the ceiling.


There is no “next GC cycle”, objc and swift use ref counting on every platform (there was an abortive GC attempt on desktops a few years back but it never saw wide use and has been deprecated since mountain lion).


these are not gc apps, they are reference counted.


It's ARMv8.x atomic instructions.


Their hardware is almost certainly specialized for reference counting, but I would be surprised if they had a custom instruction or anything.


The post kind of does: > The benefit of sticking to RC is much-reduced memory consumption. It turns out that for a tracing GC to achieve performance comparable with manual allocation, it needs several times the memory (different studies find different overheads, but at least 4x is a conservative lower bound). It implies that ref-counting is more economical in terms of wasted memory than GC, with the tradeoff being performance. This is solved thanks to the M1.


Apple’s runtimes have always used rc, this is not a change between Intel and arm.


Nope, they also tried a tracing GC for Objective-C on the desktop, but it was a failure due to interoperability across libraries compiled in different modes alongside C semantics.

Then they pivoted into automating retain/release patterns from Cocoa and sold it, Apple style, as a victory of RC over tracing GC, while moving the GC related docs and C related workarounds into the documentation archive.


> Nope, they also tried a tracing GC for Objective-C on the desktop

Operative word: tried. GC was an optional desktop-only component deprecated in Mountain Lion, which IIRC has not been accepted on MAS since 2015 was removed entirely from Sierra.

Without going into irrelevant weeds, "apple has always used refcounting everywhere" is a much closer approximation.


Operative word: failed.

Which then in Apple style ("you are holding it wrong") turned it around in a huge marketing message, while hiding away the tracing GC efforts.


> Operative word: failed.

That's not exactly relevant to the subject at hand of what memory-management method software usually uses on macos.

> Which then in Apple style ("you are holding it wrong") turned it around in a huge marketing message, while hiding away the tracing GC efforts.

Hardly?

And people are looking to refcounting as a reason why apple software is relatively light on memory, which is completely fair and true and e.g. generally assumed as one of the reasons why ios devices fare well with significantly less ram than equivalent android devices. GCs have significant advantages, but memory overhead is absolutely one of the drawbacks.


Nope, as proven by performance tests,

https://github.com/ixy-languages/ixy-languages

And the fact that M1 has special instructions dedicated to optimize RC,

https://blog.metaobject.com/2020/11/m1-memory-and-performanc...

Memory overhead in languages with tracing GC (RC is a GC algorithm) only happens in languages like Java without support for value types.

If the language supports value types, e.g. D, and there is still memory overhead versus RC, then fire the developers or they better learn to use the language features available on their plate.


> Nope, as proven by performance tests,

> https://github.com/ixy-languages/ixy-languages

This shows latency, not memory consumption, as far as I can tell.

> If the language supports value types, e.g. D, and there is still memory overhead versus RC, then fire the developers or they better learn to use the language features available on their plate.

Memory overhead of certain types of garbage collectors (notably generational ones) is well-known and it's specified relative to the size of the heap that they manage. Using value types is of course a valid point, regarding how you should use the language, but it doesn't change the overhead of the GC, it just keeps the heap it manages smaller. If the overhead was counted against the total memory use of a program, then we wouldn't be talking about the overhead of the garbage collector, but more about how much the garbage collector is actually used. Note that I'm not arguing against tracing GCs, only trying to keep it factual.


I think the author doesn't understand what Gruber wrote here. Android uses more memory because most Android software is written to use more memory (relying on garbage collection). It has nothing to do with the chips. If you ran Android on an M1, it wouldn't magically need less RAM. And Photoshop compiled for x86 is going to use about the same amount of memory as Photoshop compiled for Apple silicon. Sure, if you rewrote Photoshop to use garbage collection everywhere then memory consumption would increase, but that has nothing to do with the chip.


Maybe I misread, but I understood that more as Apple using ARC and that gives them a memory advantage. M1 is simply making that more efficient by doing retain-release faster. But I agree that should not change total memory usage.

But I think in general you could say that Apple has focused more on optimizing their OS for memory usage than the competition may have done. Android uses Java which eats memory like crazy and I suspect C# is not that much better being a managed and garbage collected language. Not sure how much .NET stuff is used on Windows, but I suspect a lot.

macOS in contrast is really dominated by Objective-C and Swift which does not use these memory hungry garbage collection schemes, nor require JIT compilation which also eats memory.


> I suspect C# is not that much better being a managed and garbage collected language

C# is better than JVM in that it has custom value types.

Say you want to allocate an array of points in Java you basically have to allocate array[pointer] all pointing to tiny 8 byte objects (for eg. 32 bit float x and y coords) + the overhead of object header. If you use C# and structs it just allocates a flat array of floats with zero overhead.

Not only do you pointlessly use memory, you have indirection lookup costs, potential cache misses, more objects for GC to traverse, etc. etc.

JVM really sucks at this kind of stuff and so much of GUI programming is passing around small structs like that for rendering.

FWIW I think they are working on some proposal to add value types to JVM but that probably won't reach Android ever.


I had a C# project that was too slow and used too much RAM.

I can attest that structs use less memory however IIRC they don't have methods so no GetHashCode() which made them way too slow to insert in a HashSet or Dictionary.

In the end I used regular objects in a Dictionary. RAM usage was a bit higher than structs (not unbearably so) but speed improvement was massive.


1. structs can have methods 2. the primary value of value types is not to use less ram (you just save a pointer, I guess times two because of GC) but the ability to avoid having to GC the things, since they are either on the stack or in contiguos chunks of memory, and to leverage cpu caches are you can iterate over contiguous data rather than hopping around in the heap. Iterating over contiguous data can be a large constant factor faster than over a collection of pointers to heap objects.


>I can attest that structs use less memory however IIRC they don't have methods so no GetHashCode() which made them way too slow to insert in a HashSet or Dictionary

You can and should implement IEquatable on a struct, especially if you plan on placing them in a hashset - the default implementation will use reflection and will be slow but it's easy to override.


I just checked, apparently structs can have methods. Is it a new thing or me that was ignorant?


You could always have methods (for as long as I can remember at least, I started using .NET in 3.0 days), you just can't inherit structs or use virtual methods because structs don't have virtual method table. You can implement interfaces however and override operators - it's very nice for implementing 3D graphics math primitives like vectors and matrices, way better than Java in this regard which was what got me into C# way back then.


If your hardware enables more regular/efficient garbage collection, then it absolutely can lower memory consumption.

Given that the M1 chip was designed to better support reference counting, it makes sense that doing the same for HC could lead to a benefit


> "M1 and memory efficiency"

Hi folks!

It looks like my blog post[1] was the primary source for this (it's referenced both by this post and by the Gruber post), and to be clear, I did not claim that this helps ARM Macs use less RAM than Intel Macs. I think John misunderstood that part and now it has blown up a bit...

I did claim that this helps Macs and iPhones use less RAM than most non-Apple systems, as part of Apple's general obsessiveness about memory consumption (really, really obsessive!). This part of the puzzle is how to get greater convenience for heap allocation.

Most of the industry has settled on tracing GCs, and they do really well in microbenchmarks. However, they need a lot of extra RAM to be competitive on a system level (see references in the blog post). OTOH, RC trends to be more frugal and predictable, but its Achilles heel, in addition to cyclic references, has always been the high cost of, well, managing all those counts all the time, particularly in a multithreaded environment where you have to do this atomically. Turns out, Apple has made uncontented atomic access about as fast as a non-atomic memory access on M1.

This doesn't use less RAM, it decreases the performance cost of using the more frugal RC. As far as I can tell, the "magic" of the whole package comes down to a lot of these little interconnecting pieces, your classic engineering tradeoffs, which have non-obvious consequences over there and then let you do this other thing over here, that compensates for the problem you caused in this other place, but got a lot out etc. Overall, I'd say a focus on memory and power.

So they didn't add special hardware for NSObject, but they did add special hardware that also tremendously helps NSObjet reference counting. And apparently they also added a special branch predictor for objc_msgSend(). 8-). Hey, 16 billion transistors, what's a branch predictor or two among friends.. ¯\_(ツ)_/¯

[1] https://blog.metaobject.com/2020/11/m1-memory-and-performanc...


Thanks for the clarification, it seems this post has gotten lost in the comments. You could try making a new post on your blog and add it to HN as a new post.


I can't understand why less RAM is enough specially in Apple Silicon rather than Intel. Is the argument proved?

RAM capacity is just RAM capacity. Possibly Swift-made apps uses less RAM compared to other apps, but microarchitecture shouldn't be matter.


> RAM capacity is just RAM capacity. Possibly Swift-made apps uses less RAM compared to other apps, but microarchitecture shouldn't be matter.

My guess it's mostly faster swapping.

Microarchitecture could help, perhaps by making context switches faster.

But it could also be custom peripheral/DMA logic for handling swapping between RAM and NVM.

I think it makes sense.. NVM should be fast enough that RAM only needs to act as more of a cache. But existing architectures have a lot of legacy of treating NVM like just a hard drive. Intel is also working on this with its Optane related architecture work.

You could also do on-the-fly compression of some kinds of data to/from RAM. But I havent heard any clues that M1 is doing that, and you'd need applications to give hints about what data is compressible.


I also believe this is the case.

Most experiment with 8GB M1 Macs I've seen so far (on YouTube) seems to start slowing down once the data cannot fit in a RAM, although the rest of the system remain responsive e.g. 8K RED RAW editing test. In the same test with 4K RED RAW there were some shuttering on the first playback but subsequent playback were smooth, which I guessed it was a result of swap being moved back into a RAM.

My guess would be they've done a lot of optimization on swap, making swapping less of an performance penalty (as ridiculous as it sounds, I guess they could even use Neural Engine to determine what should be put back into RAM at any given moment to maximize responsiveness.)

macOS has been doing memory compression since Mavericks using WKdm algorithm, but they also support Hybrid Mode[1] on ARM/ARM64 using both WKdm and a variant of LZ4 for quite some time (WKdm compress much faster than LZ4). I wouldn't be too surprised if M1 has some optimization for LZ4. So far I haven't seen anybody tested it.

It might be interesting to test M1 Macs with vm_compressor=2 (compression with no swap) or vm_compressor=8 (no compression, no swap)[2] and see how it runs. I'm not sure if there's a way to change bootargs on M1 Macs, though.

[1]: https://github.com/apple/darwin-xnu/blob/a449c6a3b8014d9406c...

[2]: https://github.com/apple/darwin-xnu/blob/a449c6a3b8014d9406c...


Exactly, several reports since the launch have pointed out that both the memory bandwidth of the RAM is higher than before, and also the SSDs are faster than before (by 2-3x in both cases I think?)

combined it should make a big difference to swapping


I don't quite see that either. But I suspect that it is just macOS itself which uses less memory than Windows in general.

But the much faster SSD to RAM transfer for the M1 means that shuffling stuff in and out of RAM is much faster meaning RAM matters less.


On my almost-stock (primarily used for iOS deployment) MBP, Catalina uses 3GB RAM (active+wired) memory on boot. It's much more than my Linux laptop (~400MB). I haven't booted Win10 recently but I'd assume it'd be close to macOS.

The transfer speeds of the M1 SSDs have been benched at 2.7GB/s - about the same speed as mid-range NVMe SSDs (my ADATA SX8200 Pro and Sabrent Rocket are both faster and go for about $120/TB).


I think OS' usage isn't main matter for RAM. Browser, Electron apps, professional apps like Photoshop, IntelliJ, and VM should be.

I expected SSD is way fast, but benchmark says its SSD is about below 3000MB/s RW, not very fast but usual Gen3 x4 speed.


I’m guessing swapping happens quicker, perhaps due to the unified memory architecture. With quicker swapping you’d be less likely to notice a delay.

That said I’d still be very hesitant buying a 8 GB M1 Mac. When my iMac only had 8 GB it was a real pain to use. Increasing my iMac’s memory to 24 GB made it usable.


Bottleneck for swap in/out must be SSD, not memory. Also its SSD isn't fast compared to other NVMe SSDs in both throughput and IOPS. Possibly latency is great thanks to integrated SSD controller onto M1 but I don't think it changes the game.


Is a SSD suitable for swap, given its limited number of write cycles?


When I looked into this, the information I found suggested that modern consumer SSDs generally have more than enough write cycles to spare for any plausible use case. Possibly this was more of an issue five to ten years ago.


I, for one would love to see an M1 chip cripple itself under the memory load of 4 intellij windows and chrome tabs, among other applications.


That's my take as well; I have a fairly modern Macbook, it's just fine, it's just that the software I run on it is far from ideal.

Intellij has a ton of features but it's pretty heavyweight because of it; I'd like a properly built native IDE with user experience speed at the forefront. That's what I loved about Sublime Text. But I also like an IDE for all the help it gives me, things that the alternative editors don't do yet.

I've used VS Code for a while as well, it's faster than Atom at least but it's still web technology which feels like a TON of overhead that no amount of hardware can compensate for.

I've heard of Nova, but apparently I installed it and forgot to actually work with the trial so I have no clue how well it works. I also doubt it works for my particular needs, which intellij doesn't have a problem with (old codebase with >10K LOC files of php 5.2 code and tons of bad JS, new codebase with Go and Typescript / React).


If you want faster Intellij experience, you can try disable built-in plug-ins and features you don’t need, power saving mode is one quick way to disable heavyweight features.


The problem with intellij is the JVM - memory usage is just off the charts. And then combine it with the gradle daemon, and it gets very hairy.


Remember Lisp machines? The M1 is a Swift machine.


I'm wondering if the "optimized for reference-counting" thing applies to other languages too. i.e. if I write a piece of software in Rust, and I make use of Rc<>, will Macs be extra tolerant of that overhead? In theory it seems like the answer should be yes


I sure hope so. In macOS 10.15, the fast path for a retain on a (non-tagged-pointer) Obj-C object does a C11 relaxed atomic load followed by a C11 relaxed compare-and-exchange. This seems pretty standard for retain-release and I'd expect Rust's Rc<> to be doing something similar. It's possible Apple added some other black magic to the runtime in 10.16 (and they haven't released the 10.16 objc sources yet) but it's hard to imagine what they'd do that makes more sense than just optimizing for relaxed atomic operations.


(Rc does not use atomics, Arc does and from a peek at the source it does indeed use Relaxed)


I didn't understand why the implementation wouldn't just do an atomic increment, but I guess Obj-C semantics provide too much magic to permit such a simple approach. The actual code, in addition to [presumably] not being inlined, does not seem easy to optimize at the hardware level: https://github.com/apple/swift-corelibs-foundation/blob/main...

The native Swift retain (swift_retain above) seems to be somewhere inside this mess: https://github.com/apple/swift/blob/main/stdlib/public/runti...


What you’ve linked is CoreFoundation’s retain, Objective-C’s can be found in https://opensource.apple.com/source/objc4/objc4-787.1/runtim... (look for objc_object::rootRetain).

The short answer for why it can’t just be an increment is because the reference count is stored in a subset of the bits of the isa pointer, and when the reference count grows too large it has to overflow into a separate sidetable. So it does separate load and CAS operations in order to implement this overflow behavior.


I’d think the bigger win would have to be in the release part of the code, which actually cares about contention.


No, because Rc<> isn't atomic. Arc<>, however, would get the benefit. The reason retain/release in ObjC/Swift are so much faster here is because they are atomic operations.


So, useful for structural sharing between threads.


Yes, it applies to everything which uses atomics and is not something special in the runtime. It's also worth noting that this is an optimization that iphones have had for the last few years ever since those switched to arm64.


The fast atomics are relatively new.


If anything, an Objective-C machine: they apparently have a special branch predictor for objc_msgSend()!

But they learned the lesson from SOAR (Smalltalk On A RISC) and did not follow the example of LISP machines, Smalltalk machines, Java machines, Rekursiv, etc. and build specialized hardware and instructions. The benefits of the specialization are much, much less than expected, and the costs of being custom very high.

Instead, they gave the machine large caches and tweaked a few general purpose features so they work particularly well for their preferred workloads.

I wonder if they made trap on overflow after arithmetic fast.


But how? It has an Arm CPU - how does it differ from any other machine with a 64 bit Arm CPU?


Remember, ARM instruction set, aka an interface, not an implementation. Arm holdings does license their tech to other companies though, so I don't know how much apple silicon would have in common, with say a Qualcomm CPU. They may be totally different under the hood.


Yes, Apple's microarchitecture is indeed completely their own, while Qualcomm uses rebranded/tweaked Arm Cortex cores.


Basically reference counting requires grabbing a number from memory in one step, then increasing or decreasing it and storing it in a second step.

This is two operations and in-between the two -- if and only if the respective memory location is shared between multiple cores or caches -- some form of synchronization must occur (like locking a bank account so you can't double draft on two ATMs simultaneously).

Now the way this is implemented varies a bit.

Apple controls most of the hardware ecosystem, programming languages, binary interface, and so on meaning there is opportunity for them to either implement or supplement ARM synchronization or atomicity primitives with their own optimizations.

There is nothing really preventing Intel from improving here as well -- it is just a easier on ARM because the ISA has different assumptions baked in, and Apple controls everything up the stream, such as the compiler implementations.


The M1 is much more than a CPU, and a CPU is much more than an instruction set


I know it's a fast Arm CPU - I've read the Anandtech analysis etc - and that there is lots of extra hardware on the SoC. But the specific point was why is it a Swift machine. What makes it particularly suited to running Swift?


The exact reason is out of my depth, but the original quote makes it clear that there is something. Memory bandwidth would be one possibility.


My guess:

1. "weak" memory ordering (atomic Aquire/Release loads/stores)

2. low memory latencies between cache and system memory (so dirty pages in caches are faster updated etc.)

3. potential a coherency impl. optimized for this kind of atomic access (purely speculative: e.g. maybe sometimes updating less then a page in a cache when detecting certain atomic operations which changed a page or maybe wrt. the window marked for exclusive access in context of ll/sc-operations and similar)

Given that it's common for a object to stay in the same thread I'm not sure how much 2. matters for this point (but it does matters for general perf.). But I guess there is a lot in 3. where especially with low latency ram you might be able to improve performance for this cases.


These are interesting points. I'd like to hazard a guess that the leading contributor is cache-related. Just looking at the https://en.wikipedia.org/wiki/Reference_counting suggests as much: "Not only do the operations take time, but they damage cache performance and can lead to pipeline bubbles."

I roughly understand how refcounting causes extra damage to cache coherency: anywhere that a refcount increment is required, you mutate the count on an object before you use it, and then decrement it later. Often times, those counting operations are temporally distant from the time that you access the object contents.

I do not really understand the "pipeline bubbles" part, and am curious if someone can elaborate.

Reading on in the wiki page, they talk about weak references (completely different than weak memory ordering referenced above). This reminds me that Cocoa has been making ever more liberal use of weak references over the years, and a lot of iOS code I see overuses them, particularly in blocks. I last looked at the objc implementation years ago, but it was some thread safe LLVM hash map split 8 or 16 ways to reduce lock contention. My takeaway was roughly, "wow that looks expensive". So while weak refs are supposed to be used judiciously, and might only represent 1% or less of all refs, they might each cost over 100x, and then I could imagine all of your points could be significant contributors.

In other words, weak references widen the scope of this guessing game from just "what chip changes improve refcounting" to "what chip changes improve parallelized, thread safe hash maps."


The "pipeline bubbles" remark refers to the decoding unit of a processor needing to insert no-ops into the stream of a processing unit while it waits for some other value to become available (another processing unit is using it). For example, say you need to release some memory in a GC language, you would just drop the reference while the pipeline runs at full speed (leave it for the garbage collector to figure out). In an refcount situation, you need to decrease the refcount. Since more than one processing unit might be incrementing and decrementing this refcount at the same time, this can lead to a hot spot in memory where one processing unit has to bubble for a number of clock cycles until the other has finished updating it. If each refcount modify takes 8 clock cycles, then refcounting can never update the same value at more than once per 8 cycles. In extreme situations, the decoder might bubble all processing units except one while that refcount is updated.

For the last few decades the industry has generally believed that GC lets code run faster, although it has drawbacks in terms of being wasteful with memory and unsuitable for hard-realtime code. Refcounting has been thought inferior, although it hasn't stopped the Python folks and others from being successful with it. It sounds like Apple uses refcounting as well and has found a way to improve refcounting speed, which usually means some sort of specific silicon improvement.

I'd speculate that moving system memory on-chip wasn't just for fewer chips, but also for decreasing memory latency. Decreasing memory latency by having a cpu cache is good, but making all of ram have less latency is arguably better. They may have solved refcounting hot spots by lowering latency for all of ram.

From Apple's site:

"M1 also features our unified memory architecture, or UMA. M1 unifies its high-bandwidth, low-latency memory into a single pool within a custom package. As a result, all of the technologies in the SoC can access the same data without copying it between multiple pools of memory." That is paired with a diagram that shows the cache hanging off the fabric, not the CPU.

That says to me that, similar to how traditionally the cpu and graphics card could access main memory, now they have turned the cache from a cpu-only resource into a shared resource just like main memory. I wonder if the GPU can now update refcounts directly in the cache? Is that a thing that would be useful?


Extremely low memory latency is another. It's also has 8 memory channels, most desktops have 2. It's an aggressive design, Anandtech has a deep dive. Some of the highlights, lower latency cache, larger reorder buffer, more in flight memory operations, etc.


Isn’t a typical desktop. 64-bit bus meaning this would be 4 channels because it’s 128-bit?


Typical desktops have 2 64 bit dimm into 2 channels (64 bits wide each) or 1 channel (128 bits wide).

The M1 Mac's seem to be 8 channels x 16 bits, which is the same bandwidth as a desktop (although running the ram at 4266 MHz is much higher than usual). The big win is you can have 8 cache misses in flight instead of 2. With 8 cores, 16 GPU cores, and 16 ML cores I suspect the M1 has more in flight cache misses than most.


> or 1 channel (128 bits wide)

The DDR4 bus is 64-bit, how can you have a 128-bit channel??

Single channel DDR4 is still 64-bit, it's only using half of the bandwidth the CPU supports. This is why everyone is perpetually angry at laptop makers that leave an unfilled SODIMM slot or (much worse) use soldered RAM in single-channel.

> The big win is you can have 8 cache misses in flight instead of 2

Only if your cache line is that small (16 bit) I think? Which might have downsides of its own.


> The DDR4 bus is 64-bit, how can you have a 128-bit channel??

Less familiar with the normal on laptops, but most desktop chips from AMD and Intel have two 64 bit channels.

> Which might have downsides of its own.

Typically for each channel you send an address, (a row and column actually), wait for the dram latency, and then get a burst of transfers (one per bus cycle) of the result. So for a 16 bit wide channel @ 3.2 Ghz with a 128 byte cache line you get 64 transfers, one ever 0.3125 ns for a total of 20ns.

Each channel operates independently, so multiple channels can each have a cache miss in flight. Otherwise nobody would bother with independent channels and just stripe them all together.

Here's a graph of cache line throughput vs number of threads.

https://github.com/spikebike/pstream/blob/master/png/apple-m...

So with 1,2 you see an increase in throughput, the multiple channels are helping. 4 threads is the same as two, maybe the L2 cache has a bottleneck. But 8 threads is clearly better than 4.


> two 64 bit channels

Yeah, I'm saying you can't magically unify them into a single 128-bit one. If you only use a single channel, the other one is unused.


It's pretty common for hardware to support both. On the Zen1 Epyc's for instance some software preferred a consistent latency from stripped memory over the NUMA aware latency with separate channels where the closer dimms have lower latency and the further dimms had higher.

I've seen similar on Intel servers, but not recently. This isn't however typically something you can do at runtime, just boottime, at least as far as I've seen.


I don't know anything about memory access.

But doesn't that only help if you have parallel threads doing independent 16 bit requests? If you're accessing a 64 bit value, wouldn't it still need to occupy four channels?


Depends. Cachelines are typically 64-128 bytes long and sometimes depending on various factors that might be across on memory channel, or spread across multiple memory channels, somewhat like a RAID-0 disk. I've seen servers (opterons I believe) that would allow mapping memory per channel or across channels based on settings in BIOS. Generally non-NUMA aware OS ran better with stripped memory and NUMA aware OSs ran better non-stripped.

So striping a caching line across multiple channels goes increase bandwidth, but not by much. If the dram latency is 70ns (not uncommon) and your memory is running at 3.2 GHz on a single 64 bit wide channel you get 128 bytes in 16 transfers. 16 transfers at 3.2GHz = 5ns. So you get a cache line back in 75ns. With 2 64 bit channels you can get 2 cache lines per 75ns.

So now with a 128 bit wide channel (twice the bandwidth) you wait 70ns then get 8 transfers @ 3.2GHz = 2.5ns. So you get a cache line back in 72.5ns. Clearly not a big difference.

So the question becomes for a complicated OS with a ton of cores do you want one cacheline per 72.5ns (the stripped config) or two cachlines per 75ns (the non-stripped config).

In the 16 bit 8 channel (assuming the same bus speed and latency) you get 8 cacheline per 90ns. However not sure what magic apple has but I'm seeing very low memory latencies on the M1, on the order of 33ns! With all cores busy I'm seeing cacheline througput of a cacheline per 11ns or so.


I believe modern superscalar architectures can run instructions out of order if they don't rely on the same data, so when paused waiting for a cache miss, the processor can read ahead in the code, and potentially find other memory to prefetch. I may be wrong about the specifics, but these are the types of tricks that modern CPUs employ to achieve higher speed.


Sure, but generally a cacheline miss will quickly stall, sure you might have a few non-dependent instructions in the pipeline, but running a CPU at 3+GHz and waiting 70ns is an eternity. Doubly so when you can execute multiple instructions per cycle.


You have to consider that a DRAM delivers a burst, not a single word. Usually the channel width times the burst length equals the cache line size.


I doubt this is relevant, but a typical desktop has 2x64-bit channels and the M1 has either 4x32-bit or 8x16-bit channels.


Anandtech claims to have identified 8 x 16 channels on the die and the part number is compatible with 4 lpddr4x chips that support 2 channels each.


Memory bandwidth costs a lot of power so I think the idea is to reduce the need for unnecessary memory ops both saving power and reducing latency


That's true but for it to be a "swift machine" as mentioned above it would imply some kind of isa level design choices, as opposed to "just" being extremely wide or having a branch predictor that understands what my favourite food is


I think they were using hyperbole to make a point.


I doubt Apple allows anyone with the knowledge to speak about it.


This is where it's worth pointing out that Apple is an ARM architecture licensee. They're not using a design from ARM directly, they're basically modifying it however it suits them.


Indeed, they’re an ISA licensee, and I don’t think they’re using designs from ARM at all. They beat ARM to the first ARM64 core back in 2013 with the iPhone 5s.


I don't think this applies to good software. Nobody will retain/release something in a tight loop. And typical retain/releases don't consume much time. Of course it improves metrics like any other micro-optimization, so it's good to have it, but that's about it.


Retains and releases often happen across function call boundaries.


Taking that as true for a moment, I wonder what other programming languages get a benefit from Apple's silicon then? PHP et al. use reference counting too, do they get a free win, or is there something particular about Obj-C and Swift?


Not the specific thing mentioned in the article. There is other hardware that optimizes Objective-C specifically, but it’s a branch hint.


any links or other info about that?

sounds realy cool



thats realy impressive... and really vindicates the whole "if your serious about software, make your own hardware too"


Android phones are build on managed code, but PC computers are built on C/C++ mostly (almost all productivity apps, browsers, games, the operating system itself). And the only GC code most people run is garbage collected on apple too - it's Java Script on the web.

I'm not familiar with MacOs, are the apps there mostly managed code? Even if they were and even if refcounting on Mac is that much faster than refounting on PC - refcounted code would still lose to manual memory management on average.


How does this work? Isn't reference counting a lot of +1 and -1?


It is a lot of atomic +1 and -1, meaning possible thread contention, meaning that no matter how many cores your hardware has you have a worst case scenario where all your atomic reference counted objects have to be serialized, slowing everything down. I do not know how ObjectiveC/Swift deals with this normally, but making that operation as fast as possible on the hardware can have huge implications in real life, as evidenced by the new Macs.


On intel atomically increnting a counter is always sequentially consistent. ARM might get away with weaker barriers.


It's a lot +-1 on atomic variables guarded using atomic memory operations (mainly with the Aquire/Release ordering) on memory which might be shared between threads.

So low latency of the cache to system RAM can help here, at least for cases where the Rc is shared between threads. But also if the thread is not shared between threads but the thread is moved to a different CPU. Still it's probably not the main reason.

Given how atomic (might) be implemented on ARM and that the cach and memory is on the same chip my main guess is that they did some optimizations in the coherency protocol/implementation (which keeps the memory between caches and the system memory/RAM coherent). I believe there is a bit of potential to optimize for RC, i.e. to make that usage pattern of atomics fast. Lastly they probably take special care that the atomic related instructions used by Rc are implemented as efficient as possible (mostly fetch_add/fetch_sub).


Just to be clear, the RAM/memory and cache are not on the same chip/die/silicon. They are part of the same packaging though.

> which keeps the memory between caches and the system memory/RAM coherent Isn't this already true of every multi-core chip ever designed; the whole point of coherency is to keep the RAM/memory coherent between all the cores and their caches.


Oh, right.

> Isn't this already true of every multi-core chip ever designed;

Yes, I just added the explanation of what coherency is in this context as I'm not sure how common the knowledge about it is.

The thing is there are many ways how you can implement this (and related things) with a number of parameters involved which probably can be tuned to optimize for typical RC's usage of atomic operations. (Edit: Just to be clear there are constraints on the implementation imposed by it being ARM compatible.)

A related example (Not directly atomic fetch add/sub and not directly coherency either) would be the way LL/SC operations are implemented. Mainly on ARM you have a parameter of how large the memory region "marked for exclusive access" (by an LL-load operation) is. This can have mayor performance implications as it directly affects how likely a conditional store fails because of accidental inference.


> running on silicon optimized to make reference counting as efficient as possible

I'm curious to understand this. Is this because of a specific instruction set support, or just the overall unified memory architecture?


> This, in a nutshell, helps explain why iPhones run rings around even flagship Android phones,

For the price, it better run circles and squares. It should cook my dinner too.


A new 2020 iPhone SE costs US$399 and outperforms flagship Android phones that cost much more:

https://www.androidcentral.com/cheapest-iphone-has-more-powe...

https://www.androidauthority.com/iphone-se-vs-most-powerful-...


At the hardware level, does this mean they have a much faster TLB than competing CPU's, perhaps optimized to patterns in which NSObjects are allocated? Speaking of which, does Apple use a custom malloc or one of the popular implementations like C malloc, tcmalloc, jemalloc, etc.?


Apple uses its own libmalloc: https://opensource.apple.com/tarballs/libmalloc/


I don't think this really makes sense. How many of the benchmarks that people have been running are written in Objective-C? They're mostly hardcore graphics and maths workloads that won't be retaining and releasing many NSObjects.


I agreed. It think it's typical of cargo-culting: explanations don't need to make sense, it's all about the breathless enthusiasm.

Look, want to know how M1 achieve its result? Easy. Apple is first with a 5nm chips. Look in the past: every CPU maker gains both speed and power efficiency when going down a manufacturing node.

Intel CPU were still using a 14nm node (although they called 12+++) while Apple M1 is now at 5nm. According to this [1] chart, that's a transistor density at least 4x.

Not saying Apple has no CPU design chops, They've been at it for their phones for quite a while. But people are just ignoring the elephant in the room: Apple gives TSMC a pile of cash to be exclusive for mass production on their latest 5nm tech.

   [1] https://www.techcenturion.com/7nm-10nm-14nm-fabrication#nbspnbspnbspnbsp7nm_vs_10nm_vs_12nm_vs_14nm_Transistor_Densities


The bit about reference counting being the reason that Macs and iOS devices get better performance with less ram makes no sense. As a memory management strategy, reference counting will always use more ram because a reference count must be stored with every object in the system. Storing all of those reference counts requires memory.

A reference counting strategy would be more efficient in processor utilization compared to garbage collection as it does not need to perform processor intensive sweeps through memory identifying unreferenced objects. So reference counting trades memory for processor cycles.

It is not true that garbage collection requires more ram to achieve equivalent performance. It is in fact the opposite. For programs with identical object allocations, a GC based system would require less memory, but would burn more CPU cycles.


“A reference counting strategy would be more efficient in processor utilization compared to garbage collection as it does not need to perform processor intensive sweeps through memory identifying unreferenced objects. So reference counting trades memory for processor cycles.”

I think it’s the reverse.

Firstly, garbage collection (GC) doesn’t identify unreferenced objects, it identifies referenced objects (GC doesn’t collect garbage). That’s not just phrasing things differently, as it means that the amount of garbage isn’t a big factor in the time spent in garbage collection. That’s what makes GC (relatively) competitive, execution-time wise. However, it isn’t competitive in memory usage. There, consensus is that you need more memory for the same performance (https://people.cs.umass.edu/~emery/pubs/gcvsmalloc.pdf: with five times as much memory, an Appel-style generational collector with a non-copying mature space matches the performance of reachability-based explicit memory management. With only three times as much memory, the collector runs on average 17% slower than explicit memory management)

(That also explains why iPhones can do with so much less memory than phones running Android)

Secondly, the textbook implementation of reference counting (RC) in a multi-processor system is inefficient because modifying reference counts requires expensive atomic instructions.

Swift programs spend about 40% of their time modifying reference counts (http://iacoma.cs.uiuc.edu/iacoma-papers/pact18.pdf)

So, reference counting gets better memory usage at the price of more atomic operations = less speed.

That last PDF describes a technique that doubles the speed of RC operations, decreasing that overhead to about 20-25%.

It wouldn’t surprise me if these new ARM macs use a similar technique to speed up RC operations.

It might also help that the memory model of ARM is weaker than that of x64, but I’m not sure that’s much of an advantage for keeping reference counts in sync across cores.


> reference counting will always use more ram because a reference count must be stored

True, reference counting stores references… but garbage collection stores garbage, which is typically bigger than references :)

(Unless you’re thinking of a language where the GC gets run after every instruction - but I’m not aware of any that do that, all the ones I know of run periodically which gives garbage time to build up)


No, many garbage collection approaches WILL require more RAM, some need twice as much RAM to run efficiently. Then there is the case that with garbage collection can have a delay which lets garbage pile up thus using more memory than necessary. Retain-release used by Apple is not as efficient, but you reclaim memory faster. https://www.quora.com/Why-does-Garbage-Collection-take-a-lot...


I think it's more a design pattern you see more with GC collected languages where people instantiate objects with pretty much every action and then let the the GC handle the mess afterward. Every function call involves first creating a parameters object, populating it, then forgetting about it immediately afterward.

I've seen this with java where the memory usage graph looks like a sawtooth, with 100s of MB being allocated and then freed up a couple of seconds later.


Isn't it the case with Java that it will do this because you do have the memory to spend on it? Generally this "handling the mess afterward" involves some kind of nursery or early generation these days, but their size may be use-case-dependent. If tuned for a 8/16 GB environment, presumably the "sawtooth" wouldn't need to be as tall.


Yes, but the reason it does it is to make it fast. The shorter your sawtooth spokes, the slower your app.


Slower in what metrics? Latency? Throughput? Not to mention that the behavior may strongly depend on the GC design and the HW platform in question. It seems far too difficult to make a blanket statement about what is and isn't achievable in a specific use case.


> reference counting will always use more ram because a reference count must be stored with every object in the system

In the tracing GCs I have seen, an "object header" must be stored with every object in the system; the GC needs it to know which parts of the object are references which should be traced. So while reference counting needs extra space to store the reference count, tracing GC needs extra space to store the object header.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: