>Given that Hackintoshers are a particular bunch who don’t take kindly to the Apple-tax[...]
I have zero issues with an Apple premium or paying a lot for hardware. I think a major generator of interest in hackintoshes has been that there are significant segments of computing that Apple has simply completely (or nearly completely) given up on, including essentially any non-AIO desktop system above the Mini. At one point they had quite competitive PowerMacs and then Mac Pros covering the range of $2k all the way up to $10k+, and while sure there was some premium there was feature coverage, and they got regular yearly updates. They were "boring", but in the best way. There didn't need to be anything exciting about them. The prices did steadily inch upward, but far more critically sometime between 2010 and 2012 somebody at Apple decided the MP had to be exciting or something and created the Mac Cube 2, except this time to force it by eliminating the MP entirely. And it was complete shit, and to zero surprise never got a single update (since they totally fucked the power/thermal envelope, there was nowhere to go) and users completely lost the ability to make up for that. And then that was it, for 6 years. Then they did a kind of sort of ok update, but at a bad point given that Intel was collapsing, and forcing in some of their consumer design in ways that really hurt the value.
The hackintosh, particularly virtualized ones in my opinion (running macOS under ESXi deals with a ton of the regular problem spots), has helped fill that hole as frankenstein MP 2010s finally hit their limits. I'm sure Apple Silicon will be great for a range of systems, but it won't help in areas that Apple just organizationally doesn't care about/doesn't have the bandwidth for because that's not a technology problem. So I'm a bit pessimistic/whistful about that particular area, even though it'll be a long time before the axe completely falls on it. It'll be fantastic and it's exciting to see the return of more experimentation in silicon, but at the same time it was a nice dream for a decade or so to be able to freely take advantage of a range of hardware the PC market offered which filled holes Apple couldn't.
Not only that, though. Enthusiasts are also extremely fickle and quick to jump ship to a cheaper hardware offering. If you look at all of Apple’s other markets, you’ll see loads of brand loyalty. Fickle enthusiasts don’t fit the mould.
Considering Apple has only went after those who profiteered by selling pre-built Hackintosh and not everyone who are profiteering from Hackintosh scene; I would say Apple did care about the Hackintosh community in some way.
I thought the higher performance/price Hackintosh, especially with Ryzen might force Apple to act differently but now with M1, Apple needn't worry about Hackintosh performance/price anymore.
Looking over the shoulder at a 64-core Threadripper with 256GB of ECC RAM, 3090FE, Titan RTX and Radeon VII, yeah right. Some of us do Hackintoshing because we want more dope specs than what Apple offers and customizability that comes with PC hardware.
If you're a big enough org or the app is for internal use, this might not be an option anyway. At that point I imagine most people just give up on it and figure out how to run macOS on a generic VM. But at that point you have to convince your IT department that it's worth it doing a thing that is definitely unsupported and in violation of the TOS.
Or maybe some of these are big enough that they are able to approach Apple and get a special license for N concurrent instances of macOS running on virtualized hardware? Who knows.
Adobe is smaller by contrast but I'd speculate has a much deeper relationship with Apple as well
I care to the extent customers care.
The answer that most people would like to see would be a stripped down, non-GUI macOS that's installable at no cost in virtualization environments, or maybe with some evaluation scheme like Windows Server has, which effectively makes it free for throwaway environments like build agents.
That's called "Darwin" and it's theoretically open source, but there doesn't seem to be a useful distribution of it. Whether that's due to lack of community interest or lack of Apple support is the question.
This perception strikes me as having warped in from a different decade. Nowadays, at least in my neck of the woods, developers almost universally use laptops, and Apple's still plenty competitive in the (high end) laptop department.
For the most part, the only developers I know who still use desktops are machine learning folks who don't like the cloud and instead keep a Linux tower full of GPUs in a closet somewhere. And then remote into it from a laptop. Half the time it's a MacBook, half the time it's a XPS 13. And they were never going to consider a Mac Pro for their training server, anyway, because CUDA.
I couldn't speak to power users, but my sense is that, while it meant something concrete in the '90s, nowadays it's a term that only comes out when people want to complain about the latest update to Apple's line of computers.
But I don't think that we need to beat a dead horse like that. The more interesting one would be to figure out some interesting and non-trivially-sized cross-section of people who both need a workstation-class computer, and have the option of even considering using OS X for the purpose.
The last time I used it (the last MBP with Ethernet built in. I want to say 2012 or 2013?) some of the features "missing" in Bootcamp
- No EFI booting. Instead we emulate a (very buggy!) BIOS
- No GPU switching. Only the hot and power hungry AMD GPU is exposed and enabled
- Minimal power and cooling management. Building Gentoo in a VM got the system up to a recorded 117 degrees Celsius in Speccy!
- Hard disk in IDE mode only, not SATA! Unless you booted up OS X and ran some dd commands on the partition table to "trick" it into running as a SATA mode disk
The absolute, crushing cynic in me has always felt that this was a series of intentional steps. Both a "minimum viable engineering effort" and a subtle way to simply make Windows seem "worse" by showing it performing worse on a (forgive the pun) "Apples to Apples" configuration. After all, Macs are "just Intel PC's inside!" so if Windows runs worse, clearly that's a fault of bad software rather than subtly crippled hardware
I have the feeling that Apple just cares about Apps for iOS (money wise). What's the minimum they need to do so people write iOS apps?
If this hardware, incidentally, is good for your use case, all is good. If not, they might just shrug it and decide you're too niche (i.e. not adding too much value to their ecosystem) and abandon you.
They choose to make the mac pro as some kind of halo product, I guess. But really the slice of people who need more power than an iMac, and less than this "Linux tower full of GPUs" or a render farm, they judge to be very small indeed. This wasn't true in the 90s, when laptops (and super-slim desktops) came with much bigger compromises.
Important to note 'some' here. I'm a developer and power user, and haven't had a desktop computer in almost 10 years.
You mean... software developers? The same people who almost universally use a Mac?
This is where I'm at.
I don't know if other people are built from sturdier stuff than me or what, but typing on a laptop to any significant extent leaves me with tendonitis for several days. And staring at a laptop screen too long leaves me with neck pain.
Laptops are a nightmare in terms of ergonomics.
It's been a bit of a blessing for me because I only have a laptop at home, and it basically means I can't take work home with me.
But I'm pretty seriously considering upgrading to a traditional desktop sometime in the next year.
I also use a Wacom tablet comfortably placed on a table to my right.
This has become steadily less true since about 2012, in my experience. I don’t know any full time developers still using an Apple laptop. The keyboard situation caused a lot of attrition. I finally stopped support for all Apple hardware at my company months ago, simply to get it out my headspace. Will Fusion360 again be completely broken by an Apple OS update? Am I going to have to invest time making our Qt and PyQt applications work, yet again, after an Apple update? Are Apple filesystem snapshots yet again going to prove totally defective? The answer is “no”, because we really need to focus on filling customer orders, do we’re done with Apple. ZFS snapshots function correctly. HP laptop keyboards work ok. Arch Linux and Windows 10 (with shutup10 and mass updates a few times per year) get the job done without getting in my face every god damned day.
Fascinating. I can name a few startups in my town that use Apple. One just IPO'd (Root), another is about to (Upstart). There are others as well.
The big companies it's hit or miss. Depends on if they are working on big enterprise applications or mobile/web. Mobile and web teams are all on MacBook Pros, and the big app dev teams aren't.
When I was last in Mountain View they were on Mac as well but I know that depends on personal preference.
* in very specific places and conditions.
Actual numbers from every single credible survey puts macs at a grand maximum of 25%.
For a while osx had the edge because it had a nice interface while still offering a lot of unix. Now windows and linux has caught up in the areas they were lacking before. Meanwhile apple has been caring less and less about people using the cli.
Memory bandwidth is one key feature impacting M1's performance. When Apple builds an ARM-based MacPro, we can expect something with at the very least 5 DDR5 channels per socket. It's clear, from this, the M1 is a laptop/AIO/compact-desktop chip.
So yes, at the very least 8 DDR4 channels, or one per core, but I'd expect more from a workstation-class board.
Now, speaking of the board, all those memory channels will be funny.
However the M1 is already pretty large (16B transistors), upgrading to 8 fast cores is going to significantly increase that. Maybe they will just go to a dual CPU configuration which would double the cores, memory bandwidth, and total ram.
Or move the GPU and ML accelerator offchip.
I'm guessing very few developers need the extra power a desktop offers over a high-end laptop.
Those requirements don’t dictate a desktop. Also, the physical size of the monitor is irrelevant, it’s the resolution that matters. Your video card doesn’t care if you have a 40” 4K monitor or an 80” 4K monitor, to it, it’s the same load.
The reason I still have a cheese grater Mac Pro desktop at all is because I have 128gb RAM in it and have tasks that need that much memory.
 I’ve connected eight external monitors to my 16” MBP (with laptop screen still enabled, so 9 screens total). I don’t use the setup actively, did it as a test, but it very much works. The setup was as follows:
TB#1 - 27” LG 5K @ 5120x2880
TB#2 - TB3<->TB2 adapter, then two 27” Apple Thunderbolt Displays @ 2560x1440
TB#3 - eGPU with AMD RX580, then two 34” ultrawides connected over HDMI @ 3440x1440, two 27” DisplayPort monitors @ 2560x1440
TB#4 - TB3<->TB2 adapter, then 27” Apple Thunderbolt Display @ 2560x1440
So that’s almost 50 million pixels displayed on around 4,000 square inches of screens driven by a single MBP laptop.
(I kid, I kid)
Furthermore, how is that relevant to the point _I_ was making about needing more than 64gb of RAM? If you both want to tangent, fine do so, but don’t try to put words in my mouth while doing it.
It is being called "using an example" or "illustrative example". For comparison, I've used a type of RAM that is traditionally much more expensive than you find in laptops.
> This is on “processor package” RAM and will thus have an entirely different price basis than a removable stick would,
1) The same price is being asked for RAM in non-M1 models.
2) You could put any price tag you want, because the item is single-sourced, the vendor can pull a quote out of the thin air and you cannot find exact equivalent on the market. Therefore, for comparison, a functionally and parametric similar item is being used.
> how is that relevant to the point _I_ was making about needing more than 64gb of RAM?
You get a different product, that supports more RAM.
> If you both want to tangent, fine do so, but don’t try to put words in my mouth while doing it.
Could you point out, where I did that? I was pointing out, that your note about the GP being hyperbolic is untrue - he was in the ballpark.
Essentially as in the ballpark as $80 is, both are off by 2.5x. Claiming they are “same order of magnitude, so it’s not hyperbolic” is laughable. $100k and $250k are both same order of magnitude, but are radically different prices, no?
If more than a niche had speed as its sole priority, then they would already use desktops, but most (80%+) use laptops today.
But of the majority that uses laptops, most would like a faster machine. Just would prefer it was also a laptop.
At times there seemed to be a real disdain for the people who loved upgrading their machines as well as those who gamed on them. Apple's products were not meant to be improved by anyone other than Apple and you don't sully them with games. The Mac Pro seems to be the ultimate expression of "You are not worthy" from the base system which was priced beyond reason to the monitor and stand. It was the declaration of, "fine, if you want to play then it will cost you" because they didn't really care about the enthusiast of the wrong use - games and such.
So it’s not that hackingtosh builders are anything, at all, it’s that they’re outnumbered by iPhone buys 1 to a million.
Just looking at the first sentences:
GP: > I have zero issues with an Apple premium or paying a lot for hardware.
parent: > the hackintosh/enthusiast market [...] are the most price conscious segment
There's an enormous price gap between a Mac Mini and the Mac Pro (especially when the Mini now has higher single-threaded performance than the base Pro...) which Apple has widened in the last decade or two.
The 2013 mac pro was a mess. pass.
The latest mac pro... I think it wasn't just expensive, it was sort of sucker expensive.
I appreciate that the 2013 mac pro wasn't for you, but it was perfect for me: small but powerful. Firstly: RAM. I was able to install 64 GiB on it, which enabled me to run Cloud Foundry on ESXi on Virtual Workstation on macOS. Non-Xeon chipsets maxed-out at (IIRC) 16 GiB and then later 32 GiB—not enough.
Secondly, size & esthetics: it fits on my very small console table that I use as a desk. I have a modest apartment in San Francisco, and my living room is my office, and although I had a mini-tower in my living room, I didn't like the looks.
Third, expandability: I was able to upgrade the RAM to 64 GiB, the SSD to 1 TB. I was able to upgrade the monitor to 4k. It has 6 Thunderbolt connections.
My biggest surprise was how long it has lasted: I typically rollover my laptops every year or so, but this desktop? It's been able to do everything I've needed it to do for the last 7 years, so I continue to use it.
[edited for grammar]
Part of the "mess", I'd argue, was that Apple backed themselves into a thermal corner where they couldn't update the machine but also wouldn't cut its price so it got steadily worse value as time wore on.
This has long been an issue for Apple products. It's why the best time to buy an Apple product is right after an update.
... then I went to NewEgg and got 192GB of memory for $800ish, rather than Apple's exorbitant $3,000. And seriously, why? Same manufacturer, same specs. And convenience factor? It took a good 45 seconds to install the memory, and I'd wager anyone could do it (it's on the 'underside' of the motherboard, all by itself, and has a little chart on the memory cover to tell you exactly what slots to use based on how many modules you have).
And then I bought a 4x M.2 PCIe card and populated it with 2TB SSDs (that exceed the Apple, with sustained R/W of 4500MB/s according to Blackmagic) for just around $1,100, versus the $2,000 Apple wanted. Only downside is that it cannot be the boot drive (or maybe it can, but it can't be the _only_ drive).
It's the kind of Mac that makes you get an iMac to put on your desk and a beefy Linux server-grade box you hide somewhere, but that does all your heavy lifting.
Ideally the enhanced cooling from the Pro models would trickle down to the non-Pro. By all reports the (i)Mac Pro is virtually silent but in the low-power ARM world a desktop machine that size could almost be passively cooled, even under load.
I bet Apple would love to release an all-in-one iMac Pro powered by an iteration on the M1. They could put a Dolby Vision 8k display in it and drag race against Threadripper machines for UHD video workloads.
Outside of the Mac mini, the most powerful desktop machine was actually iMacs, with all the compromises that come with the form factor, and the trashcan Mac Pro who was thermally constrained.
In that period, no amount of money would have helped to get peak storage + network + graphic performance for instance.
We are now in a slightly better place where as you point out, throwing insane amounts of money towards Apple solves most of these issues. Except for those who don't want a T2 chip, or need an open bootloader.
I don’t know that the “Apple tax” moniker is really fair anymore, either.
The machines have always commanded a premium for things that enthusiasts don’t see value in (I.e. anything beyond numeric spec sheet values), so most critics completely miss the point of them.
There’s a valid argument to be made that they’re also marked up to higher margins than rivals even beyond the above, but I’m not sure if any end user has really ever eaten that cost - If you buy a MacBook, there has always been someone (students) to buy it back again 3/5/10 years down the road for a significant chunk of it’s original outlay. That doesn’t happen with any other laptop - they’re essentially scrap (or worth next to nothing) within 5 years. After 10 years I might actually expect the value to be static or even increase for its collector value (e.g. clamshell iBook G3s)
The total cost of ownership for Apple products is actually lower over three years than any rival products I’m aware of.
It's not just intangibles. I really like using Macs, but my latest computer is a Dell XPS 17. This is not a cheap computer if you get the 4k screen, 64GB of RAM and the good graphics card. At those prices, you should consider the MBP16. The MBP is better built, has a better finish and just feels nicer.
Thing is, Dell will sell me an XPS 17 with a shitty screen because I don't care about the difference and would rather optimise battery life. I can get 3rd party RAM and SSDs. I can get a lesser graphics card because I don't need that either. I can get a more recent Intel CPU. And I can get the lesser model with a greater than 25% discount (they wouldn't sell me the better models with a discount though).
I think some of the Apple Tax, is them not willing you sell you a machine closer to your needs, not allowing some user replaceable parts and not having discounts.
Example: I've been looking at X1 Nano. It is improvement compared to other lines (it has 16:10 display finally!), but it is still somewhere in the middle of the road.
The competitor from Apple has slightly better display, much better wifi and no option for LTE/5G.
Nano has 2160x1350 450 nits display with Dolby Vision. Apple has 2560x1600 400 (Air)/500 (MBP) nits display with P3. The slightly higher resolution means that Apple would display 9 logical bits using 8 physical when using the 1440x900@2X resolution (177% scale), but to get similar scale on Nano that would mean displaying 8 logical pixels using 6 physical (150% scale). Similarly, the Dolby Vision is an uknown (how it could get used?), the P3 from Apple is a known.
X1 Nano has 2x2 MIMO wifi - Intel AX 200 - with no option for anything better. There are only two antennas in the display frame, you cannot add more (ok, 3, but the third one is for cellular, and cannot be used for wifi if you forego cellular). Apple ships with 4x4 MIMO. If you have decent AP at office or home, it is a huge difference, yet no PC vendors are willing to improve here.
The cellular situation is the exact opposite. You can get cellular module for Thinkpads, and you cannot for Apple, at all, so if you go this route, you have to live with workarounds.
Example - a Mac is a Mac for resale purposes - if I attempt to later sell an XPS that I've opened up and put an SSD in and a couple of SODIMMS - I now need to recoup my cost on all of those things. The problem is that if someone is looking at a used XPS with upgraded SSD and upgraded RAM they're statistically unlikely to fully investigate and value the (probably really good) parts that you upgraded it with - they're just going to see X,Y,Z numbers and price accordingly.
Generally though, a 5 year old Windows laptop with 16GB RAM still commands the value of a 5 year old Windows laptop as best I could tell looking at resale values.
I think it's still accurate and honestly that's apple's business model.
I think the resale value for a student macbook doesn't really matter. It still costs the student - while they are poor - as much as 4x what other students pay for their laptop. Many students are paying $250 for their laptop.
I couldn't sell that device for half of that a year and a half later. I got a newer laptop in 2016, again very specced out for a laptop. About 1800€, couldn't sell it for 800€ 2 years later. I still use that last one because I didn't want to sell it so far under what the market value should be.
If you try to sell anything Apple related that isn't more than 5 years old you won't have that problem at all. You can get a good value for the device and sell it without too much of a hassle.
Even if you're a student you would likely be better off buying the cheapest macbook you can find (refurbished or second hand if needed). If you don't like the OS you can just install Windows or a Linux distro on it.
Are you sure about that 250$ number? Because I don't think that's a very realistic number.
For note taking, word processing, basic image editing, web browsing, video playing, etc, you can easily get a capable enough laptop for that price.
This is not comparing like-for-like in terms of what the machines can do, of course. Apple's range doesn't even remotely try to cover that part of the market so a direct comparison is unfair if you are considering absolute price/capability of the devices irrespective of the user's requirements, but for work that doesn't involve significant computation that bargain-basement unit may adequately do everything many people need it to do (assuming they don't plan to also use it for modern gaming in non-working hours).
> If you try to sell anything Apple related that isn't more than 5 years old you won't have that problem at all.
Most people don't consider the resale value of a machine when they buy it. For that to be a fair comparison you have to factor in the chance of it being in good condition after a couple of year's use (this will vary a lot from person to person) and the cost of any upgrades & repairs needed in that time (again more expensive for Apple products by my understanding).
And if you buy a $500 laptop and hand it down or bin it, then you are still better off (assuming you don't need
a powerful machine) than if you dropped $3,000 for an iDevice and later sold it for $2,000.
> what the market value should* be.*
"Market value" is decided by what the market will bare, not what we want to be able to sell things for, and new & second hand are often very different markets.
I'm not a student but it's pretty close I think.
I invested $350 into a Chromebook that runs native Linux about 4 years ago and it's still going strong as a secondary machine I use when I'm away from my main workstation.
It has a 13" 1080p IPS display, 4gb of memory, an SSD, a good keyboard and weighs 2.9 pounds. It's nothing to write home about but it's quite speedy to do every day tasks and it's even ok for programming where I'm running decently sized Flask, Rails and Phoenix apps on it through Docker.
If I had to use it as my primary development machine for web dev I wouldn't be too disappointed. It only starts falling apart if you need to do anything memory intensive like run some containers while also running VMs, but you could always spend a little more and get 8gb of memory to fix that problem.
I'm sure nowadays (almost 5 years later) you could get better specs for the same price.
I love Chromebooks, don't get me wrong, but the problem I've come to realise over time is that many are specced and priced just about at a point where they'll quickly move into obsolescence not long after purchase - at which point the only thing keeping them out of the ground is your willingness to tolerate them after the updates have stopped.
The Mac will still be worth a good chunk of money to someone.
I have a Chromebook Flip here that I adored for several years that I couldn't give away now.
For example if it works well enough for another 4 years, now we need to ask the question on whether or not you could get reasonable value out of an 8+ year old Mac. I never sold one so I'm not sure. My gut tells me it's going to be worth way less than what you bought it for even if it's in good condition.
But more generally, yeah I have no intentions on re-selling this thing if I decide I'm done with it before it physically breaks. I'd probably donate it or give it away for free (if someone wanted it).
I don't see that as too bad tho. If I can get 7-8 years out of $350 device I'm pretty happy, especially if the next one costs about the same.
It's a tough comparison tho because a decently decked out MBP is going to be like 8x as expensive but also have way better specs.
I think it's fairly realistic. The Dell Latitude 7250 is probably a good representative of what you can get used for ~$220-$300 US these days: https://www.ebay.com/sch/i.html?_from=R40&_trksid=p2380057.m... The dual-core processor should still be serviceable for everyday work, at ~1.3kg it's light enough to carry around all day, a 1080p resolution should be OK on a 12" screen, and it can take up to 16GiB of RAM, though holding out for one with 16GiB preinstalled will definitely tend to push the cost up to nearer $300: https://www.ebay.com/sch/i.html?_from=R40&_trksid=p2380057.m...
(Then any laptop with similar specs except with a 2-in-1 form factor tends to cost a fair bit more, but that's not a must-have for most students or anyone who might have been considering a MacBook.)
I got an i7 T430s for 200€ a few years ago and it's still plenty fast for coding, so I don't see why this number wouldn't be realistic.
It's literally the only thing that matters if you're the seller.
If you have your choice of two items to sell 5 years from now, you ideally want to be selling the item that's worth substantially more to the buyer, rather than trying to sell something worthless.
Assuming there's nothing dishonest happening, it's really up to the market to price.
Thing is, the student buying the MacBook is probably going to be substantially better off that way too, in that it will likely retain proportionally more of it's value from that point too.
Apple on the new Mac Pro that I got a month ago: 192GB memory? That will be $3,000. NewEgg? We'll sell you the same specced memory from the same manufacturer for $800. And you get to keep/sell the baseline 32GB memory.
8TB SSD? $2,000, thanks. OWC and NewEgg? Here, have a PCIe 4xM.2 card and 4 2TB SSDs for $1,100. Oh, and they'll be 50% faster, you just can't have them as the only drive on the system (my Apple SSD runs at around 2800MB/s, the alternative, 4500MB/s).
So they are entirely marked up, and look in any forum - by far most people are not doing what I'm doing, and just "going straight Apple for convenience", though the memory installation was less than 1 minute, and the SSD installation less than 5, including unboxing, seating the 4 drives, reinstalling the heatsink on the card and installing. I get "my time is money", and "it just works" (which, as we know, more and more is less the case with Apple), but really, for me, that was a $3,100 savings for <10 minutes effort.
The M1 costs Apple relatively little to produce per unit - I would expect them to keep the overall design for a Mac Pro but have stacked modules such that the side wall of the Mac Pro is a grid of 4 or more such modules each with co-located memory like the M1 has. Obviously performance would depend upon the application being amenable to a design like that but a 32 or 64 5nm core Mac Pro is not out of the question, and would be impossible to match for performance in the next few years by any Hackintosh.
Even after capacity frees up at TSMC for AMD to move to 5nm, they won’t be able to co-locate memory like the M1 does due to standards compliance with DRAM sticks.
I think the next couple of years will be really turbulent for other vendors - the M1 is likely far more significant for the PC market due to how disruptive it is than it is for the Mac market.
I would like to know if they are using fundamentally better batteries, and how much a 5nm process lead is behind this.
But I will hand it to Apple, if they finally did something to break the 4-8 hour battery life limit, a limit that always seemed to stay the same despite node shrink after node shrink after node shrink, and really about the same on-screen performance for usual browsing/productivity application use.
I was pretty distrustful of the ARM move, but if they deliver this for the Macbook Pro, I'll hop to ARM.
Associated with the CPU people "getting serious" is them pushing an OS, which would have to be Linux. Intel should have done this 20 years ago, at least as leverage to make Windows improve itself.
> stacked modules such that the side wall of the Mac Pro is a grid of 4 or more such modules each with co-located memory like the M1 has
Quad or more package NUMA topology?? The latency would absolutely suck.
Especially if the margins allow them to not engage in silliness on the software side of things like violating privacy and serving ads in the OS.
There are certainly places that Apple can be criticized, but I think in these two areas they're acting pretty well.
Er, I heard that junk doesn’t install itself on “Pro for Workstations,” but I’m not certain and even then that’s another hundred dollars more expensive than Pro.
And even Enterprise still comes with a lot of junk that, like, 90% of users won’t need. Windows AR? Paint 3D? And so on… half the things in the start menu of a stock Windows 10 Pro install are either crap like Candy Crush, fluff like 3D viewer, or niche like the Windows AR thing.
The worst part about this is that there’s definitely a middle ground between not including anything and pushing crap on people — both nearly every Linux distro I have ever seen, as well as Apple nail that balance, and to be frank with the App Store or Microsoft Store and such I really don’t see the need to include hardly anything.
You can disable some of the telemetry during install as well.
Install it in a virtual machine, every 90 days make a new virtual machine from scratch. That or use the secret code to reset the demo days. Enter an Enterprise key when you want to register it for real.
I would posit that Apple is always going to keep macOS working on some workstation-class hardware, just because that kind of machine is what Apple's software engineers will be using internally, and they need to write macOS software using macOS.
Which means one of two things:
1. If they never release a workstation-class Apple Silicon chip, that'll likely mean that they're still using Intel/AMD chips internally, and so macOS will likely continue to be compiled for Intel indefinitely.
2. If they do design workstation-class Apple Silicon chips for internal use, they may as well also sell the resulting workstation-class machines to people at that point. (Or, to rearrange that statement: they wouldn't make the chips if they didn't intend to commercialize them. Designing and fabbing chips costs too much money!)
Which is to say, whether it be a Hackintosh or an Apple Mac Pro, there's always going to be something to cater to workstation-class users of Apple products — because Apple itself is full of workstation-class users of Apple products.
I hope I'm not out of line here, but this is not what a "workstation" is. "Workstation" actually has a specific meaning in the realm of enterprise computing solutions, and developers do not (generally) use workstations.
A workstation is something that, say, the people at Pixar use, or Industrial Light and Magic. It's an incredibly powerful machine that can handle the most intensive of tasks. Software development is generally not such a task, unless you're frequently re-compiling LLVM from source or something. (And even then, it's a world of difference.)
Apple's software developers, like most software developers who use Apple machines, use MacBook Pros (for the most part). Sometimes Mac Minis if they need multiple test machines, and I'm sure there are some who also have Mac Pros. But overwhelmingly, development is done on laptops that they dock while at work and take home with them after. (This was my experience when I interned there, anyway.)
But moreover, changes to foundational macOS libraries can cause regressions in the performance of this type of software, and so macOS developers working on systems like Quartz, hardware developers working on the Neural Engine, etc., also work with these apps and their datasets as regression-test harnesses.
See also: the Microsoft devs who work on DirectX.
All of this testing requires "workstation" hardware. (Or servers, but Apple definitely isn't making server hardware that can run macOS at this point. IIRC, they're instead keeping macOS BSD-ish enough to be able to write software that can be developed on macOS and then deployed on NetBSD.)
I have always hoped that we could rely on that heuristic - that internal Apple usage of their own products would guarantee that certain workflows would be unbroken.
In practice, this has never held up.
Over the past 10-12 years it has been reinforced over and over and over: Apple engineers use single monitor systems with scattered, overlapping windows which they interact with using mousey-mousey-everything and never keyboard shortcuts.
They perform backups of critical files - and manage financial identities - using their mp3 player.
The fact that multiple monitors - and monitor handoff - is broken in fascinating new ways with every version of OSX tells you how Apple folks are (and are not) using their own products.
If so, what is the connection to professional workflows on macOS?
16 cores 32GB of ram least powerful gpu on the list 2GB of storage $8,799.00
28 cores 48GB of ram same gpu 4GB of storage $14,699
MSRP on a 2020 Toyota Corolla $19,600
AMD Ryzen Threadripper 3970X 32-Core 3.7 GHz Socket sTRX4 $2,629.90
Cost of the same basic GPU about $219
Cost of complete system equivalent to the almost car priced mac about 4200.
9k-15k isn't "a lot" its a crazy amount. 15k is 1/4 of the median households income.
Most of planet earth can't sink 6000 into a computer let alone 15000. Under Apple the standard expandable board in a box with room to expand is a category available to 1% of the US and 0.1% of the world.
The new Mac Pro is 3-4x the price of a machine built around AMD having equivalent performances. I'm building a Threadripper for exactly this reason. Most of the issue is Intel vs AMD and the fact that AMD's Threadrippers are an amazing deal when it comes to performance per dollar and that Apple has an aversion to offering decent GPUs
If it was 1/2 the price I think it could fairly be called premium priced.
Buying an iMac now would seem to be a poor decision.
From what I'm seeing in some of the comments, people are so lost in the history of the past years of Apple being the pooch ridden from behind on performance, that they can't get their heads out of their arses to see how awesome this is.
I am sitting here right now wondering if I should invest more of my savings directly in Apple stock, at least temporarily to ride their sales wave, or if I should buy a Mac mini and a nice wide curved monitor with a mechanical keyboard from WASD and be f'ing awesome all of a sudden.
The only reason I'm not hitting the buy button is that all of that isn't $135. There's no reason for that amount, but if it said $135, I'd have already paid for it and been drinking beer to celebrate the happiest purchases I ever made.
Yes, I own an iMac, as this is the closest to a desktop machine Apple sells, but a replacement for what the Mac Pro used to be, it is not.
I got curious...
- amd 3990x 64-core 128 thread $3849
- 256gb g.skill ddr 3600 $978
- noctua nh-u14s cooler $80
- samsung 980 pro 1tb nvme pcie4 $229
- evga 1000w power supply $207
- fractal design define 7 $170
- asus trx40-pro mb $396
- nvidia rtx 3080 $699 (?)
- steve jobs: a biography $15 (hardcover)
= a really maxed out system $6623
EDIT: ok, I had to know...
- amd 5950x 16 core 32 thread $799
- 128gb g.skill ddr 3600 $489
- noctua nh-u12s cooler $60
- samsung 980 pro 1tb nvme pcie4 $229
- evga 850w power supply $139
- fractal design define 7 compact $130
- asus x570-pro mb $240
- nvidia rtx 3070 $499 (?)
- steve jobs: a biography $14 (kindle)
- linux with kde mac-look icons $0
= a really really great system $2599
You'll save a fair bit of $600, run quite a bit cooler, it will be much easier to be quieter, and have twice the disk space. Or buy 2x2TB NVMe (motherboards with 2x m.2 are common these days).
Sure the 5600x/5700x isn't as fast in throughput, but how often do you max more then 6/8 cores? Per core performance is near identical and with more memory bandwidth per core you run into less bottlenecks.
I bet over a few years more people would notice double the disk than the missing extra cores.
From a discussion I had with a friend recently, I found that Precision workstations from Dell or Z workstations from HP have similar prices for similar performances (sometime prices can reach 40k or 70k dollars).
When comparing Mac Pro to an enthusiast pc build, yes the mac pro is "overpriced", but the mac pro is using a Xeon which is pricier than a ryzen (even if performance wise it's inferior) and a pro gpu which also cost more than consumer gpu (again, even if performance is inferior). The price of a nvidia quadro is always higher than a Geforce gpu with the same specs.
You can see a spec/price comparison I did when having the discussion with my friend here : https://mega.nz/file/Nj4UnSJR#fBdZfn3zoZ8boxap35-GWEgDlicH3R...
For that market, the Mac Pro is overpriced (the high-end Dell/HP workstations are too), and Apple doesn't make anything more suited for it. That's the criticism. That the Mac Pro is acceptably priced compared to the Dell/HP workstations doesn't matter if that's not what you need.
For me personally, outside of laptops and phones, I don't see the appeal of using Apple hardware (unless you want to use MacOS X).
A professional workstation, with support, services, guaranteed replacement components, guaranteed service-life, maintenance contracts and so on is very different from an enthusiast-built machine.
It's like comparing a BMW M3 to a tricked out VW Golf. You can fit a bigger, badder engine under the VW's hood, stiffen the suspension, replace the gearbox and so on but, in the end, you can get one straight from the dealer and not everyone is inclined to assemble a car from parts.
Did that once. It's fun, educational and not very practical.
The leasing thing is for slightly different reason: it is being used by such a market segment, that always wants something new. They would not drive older car, even if it was reliable, it would be not cool enough. Unfortunately, since cca 2010 BMW also found it out, and since then their cars stopped being good -- they don't have to last -- and are just expensive.
In Europe the incentives to buy expensive goods as a company (like cars, fancy office furniture, etc.) is even bigger because of VAT tax (much bigger than US sales tax).
So a machine that's £5k retail becomes £4167 without VAT, effectively £3333 if you take into account tax savings, which only apply if your company is in profit. A £15K machine effectively would still cost you £10k.
It's a big saving, sure, but it's still a very expensive machine.
You're right if you start looking at "Well I run my own company so the cost compared to paying myself that cash as a dividend is much smaller", but that only really applies to those of us who do run our own small companies, own them fully and run them profitably, and have already pumped their personal earnings up to that level. And then we're on to a question about what that box is for and why it's needed, is it a company asset or a personal one?
And remember that you get to apply the same percentage discount to any other machine - your 15K apple box may come down to a conceptual £5K hit on your pocket, if you're paying 50% personal tax on top of the company taxes, but a £4-5k Zen 3 box with dual nvidia 3090s in it will come in at £1333-£1600 by the same metric and quite likely perform better...
But I wasn't really here to talk about comparative value anyway - this was a tax discussion!
However I feel no particular guilt that the workstation I use for my full-time dev day-job also has a windows partition for gaming in the evening, and I hope that the tax authorities would see things the same way! It's not like the asset isn't a justified business purchase.
It’s an order of magnitude different with the Mac Pro, the base model is a $6000 machine that will perform like a ~$1500 PC. And the base model makes no sense to buy, it’s really a $10k-$30k machine. It’s a completely different product category.
The monitor on my MacBook Pro just died, and I bought it July of last year. The repair was about $850 USD. Luckily my credit card covered the hardware warranty, but I'm kind of wishing I'd bought AppleCare.
> Retain and release are tiny actions that almost all software, on all Apple platforms, does all the time. ….. The Apple Silicon system architecture is designed to make these operations as fast as possible. It’s not so much that Intel’s x86 architecture is a bad fit for Apple’s software frameworks, as that Apple Silicon is designed to be a bespoke fit for it …. retaining and releasing NSObjects is so common on MacOS (and iOS), that making it 5 times faster on Apple Silicon than on Intel has profound implications on everything from performance to battery life.
> Broadly speaking, this is a significant reason why M1 Macs are more efficient with less RAM than Intel Macs. This, in a nutshell, helps explain why iPhones run rings around even flagship Android phones, even though iPhones have significantly less RAM. iOS software uses reference counting for memory management, running on silicon optimized to make reference counting as efficient as possible; Android software uses garbage collection for memory management, a technique that requires more RAM to achieve equivalent performance.
(It explains why iOS does better with less ram than android, but the quote is specifically claiming this as a reason for 8GB ram to be acceptable)
Perhaps they never really needed to fit 32GB into their intel macs either.
Some days after the glowing reviews; and strange comments about magic memory utilization; we now see comments concerned about SSD wear due to swap file usage.
If the applications and data structures are more compact in memory on the arm processors; it should be easy to test; you just need an intel mac; and an M1 mac running the same app on the same document and look at how much memory it uses.
So on customer PC ram is maybe more used as caching mechanisms or eaten away by poorly designed memory leak/garbage collection.
And if your GPU is able do to real time rendering on data heavy load maybe you need less caching of intermediate results as well.
2.My current production server is a PostgreSQL database on a 16GB RAM VM running on Debian (my boss is stingy). This doesn't prevent me from managing a 300GB+ data cluster with pretty decent performances and perform actual data analysis.
3.If Chrome sometimes use +8GB for a godsake webrowser the only explanation is poor design, there is no excuse.
It's possible that apps have been completely overhauled for a baseline M1 experience. Extremely, extraordinarily unlikely that anything remotely of the sort has happened, though. And since M1-equipped Macs don't have any faster IO than what they replaced (disk, network, and RAM speeds are all more or less the same), there wouldn't be any reason for apps to have done anything substantially difference.
Third, Marcel Weiher explains Apple’s obsession about keeping memory consumption under control from his time at Apple as well as the benefits of reference counting:
>where Apple might have been “focused” on performance for the last 15 years or so, they have been completely anal about memory consumption. When I was there, we were fixing 32 byte memory leaks. Leaks that happened once. So not an ongoing consumption of 32 bytes again and again, but a one-time leak of 32 bytes.
>The benefit of sticking to RC is much-reduced memory consumption. It turns out that for a tracing GC to achieve performance comparable with manual allocation, it needs several times the memory (different studies find different overheads, but at least 4x is a conservative lower bound). While I haven’t seen a study comparing RC, my personal experience is that the overhead is much lower, much more predictable, and can usually be driven down with little additional effort if needed.
ARC is not specific to M1, BUT have been widely used in ObjC & Swift for years AND is thus heavily optimized on M1 that perform "retain and release" way faster (even when emulating x86)
Perfect illustration of Apple software+hardware long term strategy.
>This quote doesn’t really cover why M1 macs are more efficient with less ram than intel macs? You’ve got a memory budget, it’s likely broadly the same on both platforms
But both Intel Macs and ARM Macs use RC. Both chips are running the same software.
I don't see how refcounting gives you advantage over manual memory management for most users.
I am going to speculate now, but maybe, just maybe, if some of the silicon that apple has used on the M1 is used for compression/decompression they could be transparently compressing all ram in hardware. Since this offloaded from the CPUs and allows a compressed stream of data from memory, they achieve greater ram bandwidth, less latency and less usage for a given amount of memory. If this is the case I hope that the memory has ECC and/or the compression has parity checking....
Are you aware of any x86 chips that utilize this method?
Making it cheaper to create and destroy objects with hardware acceleration, and to do many small, low-cost reclaims without eating all your CPU would be a magical improvement to the JVM, because you could constrain memory use without blowing out CPU. From what's described in TFA it sounds like the same is true for modern MacOS programming.
The JVM already makes it extremely cheap to create and destroy objects: creation is always ~free (just a pointer increment), and then destruction is copying, so very sensitive to memory bandwidth but done in parallel. If most of your objects are dying young then deallocation is "free" (amortized over the cost of the remaining live objects). Given the reported bandwidth claims for the M1 if they ever make a server version of this puppy I'd expect to see way higher GC throughput on it too (maybe such a thing can be seen even on the 16GB laptop version).
The problem with Java on the desktop is twofold:
1. Versions that are mostly used don't give memory back to the OS even if it's been freed by the collector. That doesn't start happening by default until like Java 14 or 15 or so, I think. So your memory usage always looks horribly inflated.
2. If you start swapping it's death because the GC needs to crawl all over the heap.
There are comments here saying the M1 systems rely more heavily on swap than a conventional system would. In that case ARC is probably going to help. At least unless you use a modern pauseless GC where relocation is also done in parallel. Then pausing background threads whilst they swap things in doesn't really matter, as long as the app's current working set isn't swapped out to compensate.
But when swap hits 8-9 Gb, it’s effects start to get very noticeable.
Besides, a lot of memory usage is in web browsers, which must use garbage collection.
Looking at the reviews of M1 Macs, those systems are still responsive and making forward progress at a “memory pressure” that would make my x86 Mac struggle in a swap storm. It seems to come down to very fast access to RAM and storage, large on-die caches, and perhaps faster memory compression.
2 more points:
- All the evidence I've seen is gifs of people opening applications in the dock, which is... not impressive. I can do that already, apps barely allocate at all when they open to "log in to iCloud" or "Safari new tab". And don't we see that literally every time Apple launches Mac hardware? Sorry all tech reviewers everywhere, try measuring something.
- I think the actual wins come from the zillion things Apple has done in software. Like: memory compression, which come to think of it might be possible to do in hardware. Supposedly a lot of other work/tuning done on the dynamic pager, which is maybe enabled by higher bandwidth more than anything else.
Fun fact: you can stress test your pager and swap with `sudo memory_pressure`. Try `-l critical`. I'd like to see a benchmark comparing THAT under similar conditions with the previous generation.
I'm curious about FP/vector performance, but I'm pretty sure it's fine. I'm definitely eyeing a MBP myself! 20 hours of video playback? Crazy...
Because the M1 is similar to the chips used in iOS, hence the comparison is not inappropriate.
>The memory bandwidth on the new Macs is impressive. Benchmarks peg it at around 60GB/sec–about 3x faster than a 16” MBP. Since the M1 CPU only has 16GB of RAM, it can replace the entire contents of RAM 4 times every second. Think about that…
Or that's how I understand this, I don't actually own M1 Mac.
> Besides the additional cores on the part of the CPUs and GPU, one main performance factor of the M1 that differs from the A14 is the fact that’s it’s running on a 128-bit memory bus rather than the mobile 64-bit bus. Across 8x 16-bit memory channels and at LPDDR4X-4266-class memory, this means the M1 hits a peak of 68.25GB/s memory bandwidth.
The point of the memory bandwidth is so that it never has to swap to disk in the first place.
What? How does memory bandwidth obviate the need for disk swapping?
It is. We know they're using a unified memory architecture, they pointed it out in the presentation.
The reason GDDR isn't typically used for system RAM is it's higher latency & more power hungry. Like, the GDDR6 memory on a typical discreet card uses more power than the an entire M1-powered Mac Mini power hungry.
What "normal" laptop has that?
The latency appears real to me.
Separately from that x86's TSO-ish memory model also imposes a performance cost whether your algorithm needs those guarantees or not. Code sometimes relies on those guarantees without knowing it. Absent hardware support you would need to insert ARM atomics in translated code to preserve those guarantees which on most ARM CPUs would impose a lot of overhead. The M1 allows Rosetta to put the CPU into a memory ordering mode that preserves the expected memory model very efficiently (as well as using 4K page size for translated processes).
They are fast for atomics but still far, far slower than the equivalent non-atomic operation. An add operation takes around half a cycle (upper bound here - with how wide the firestorm core is an add operation is almost certainly less than half a cycle). At 1ghz a cycle is 1 nanosecond. The M1 runs at around 3ghz. So you're still talking the atomic operation being >10x slower than non-atomics.
Which should not be surprising at all. Apple didn't somehow invent literal magic here. They still need coherency across 8 cores, which means at a minimum L1 is bypassed for the atomic operation. The L2 latency is very impressive, contributing substantially to that atomic operation performance. But it's still coming at a very significant cost. It's very, very far from free. There's also no ARM vs. x86 difference here, since the atomic necessarily forces a specific memory ordering guarantee that's stricter than x86's default. Both ISAs are forced to do the same thing and pay the same costs.
How did you arrive at this number?
It's in the post. Half a cycle for an add or less, and cycles are every 1/3 nanosecond. So upper bound for an add would be around 1/6th a nanosecond. Likely less than that still yet, since the M1 is probably closer to an add in 1/8th a cycle not 1/2. Skylake by comparison is at around 1/4th a cycle for an add, and since M1's IPC is higher it's not going to be worse at basic ALU ops.
6 nanoseconds @ 3ghz is 18 cycles. That's on the slow end of the spectrum for a CPU instruction.
I'm not an EE expert and I haven't torn apart an M1, but Occams's Razor would suggest it's unlikely they made specialized hardware for NSObjects specifically. Other ARC systems on the same hardware would likely see similar benefits.
Kotlin/Native lets us do this comparison somewhat directly. The current and initial versions used reference counting for memory management. K/N binaries were far, far slower than the equivalent Kotlin programs running on the JVM and the developer had to deal with the hassle of RC (e.g. manually breaking cycles). They're now switching to GC.
The notion that GC is less memory efficient than RC is also a canard. In both schemes your objects have a mark word of overhead. What does happen though, is GC lets you delay the work to deallocate from memory until you really need it. A lot of people find this quite confusing. They run an app on a machine with plenty of free RAM, and observe that it uses way more memory than it "should" be using. So they assume the language or runtime is really inefficient, when in reality what's happened is that the runtime either didn't collect at all, or it collected but didn't bother giving the RAM back to the OS on the assumption it's going to need it again soon and hey, the OS doesn't seem to be under memory pressure.
These days on the JVM you can fix that by using the latest versions. The runtime will collect and release when the app is idle.
Specifically, as I understood it is that Apple software (written in objective C/Swift) uses a lot of retain/release (or Atomic Reference Counting) on top of manual memory, for memory management rather than other forms of garbage collection (such as those found in Java/C#), which gives Objective C programs a lower memory overhead (supposedly). This is why the iPhone ecosystem is able to run so much more snappier than the Android ecosystem.
That said, I don't see how that translates to lower memory usage than x86 programs. I think the supporting quotes he used for that point are completely orthogonal. I don't have an M1 mac, but I believe the same program running on both machines should use the same amount of memory.
I think you can reach a lot more than that. Presumably, on Intel they use something like LZO or LZ4, since it compresses/decompresses without too much CPU overhead. But if you have dedicated hardware for something like e.g. Brotli or zstd, one could reach much higher compression ratios.
Of course, this is assuming that memory can be compressed well, but I think this is true in many cases. E.g. when selecting one of the program/library files in the squash benchmarks:
you can observe higher compression ratios for e.g. Brotli/gzip/deflate than LZO/LZ4.
Edit: apparently, this isn't common knowledge.
GC vs RC is not a trivial comparison to make, but overall there are good reasons new systems hardly use RC (Objective-C dating back to the 90s isn't new). Where RC can help is where you have a massive performance cliff on page access, i.e. if you're swapped to disk. Then GC is terrible because it'll try and page huge sections of the heap at once where as RC is way more minimal in what it touches.
But in most other scenarios GC will win a straight up fight with an RC based system, especially when multi-threading gets involved. RC programs just spend huge amounts of time atomically incrementing and decrementing things, and rummaging through the heap structures, whereas the GC app is flying along in the L1 cache and allocations are just incrementing a pointer in a register. The work of cleaning up is meanwhile punted to those spare cores you probably aren't using anyway (on desktop/mobile). It's tough to beat that by hand with RC, again, unless you start hitting swap.
If M1 is faster at memory ops than x86 it's because they massively increased memory bandwidth. In fact I'd go as far as saying the CPU design is probably not responsible for most of the performance increase users are seeing. Memory bandwidth is the bottleneck for a lot of desktop tasks. If M1 core is say 10% faster than x86 but you have more of them and memory bandwidth is really 3x-4x larger, and the core can keep far more memory ops in flight simultaneously, that'll explain the difference all by itself.
On the other hand, that same paper shows that for every single one of their tested workloads, the generational GC outperforms manual memory management. Now obviously, you could do better with manual memory management if you took the time to understand the memory usage of your application to reduce fragmentation and to free entire arenas at a time, but for applications that don't have the developer resources to apply to that (the vast majority), the GC will win.
I'm not saying that better memory management is the reason Android wins these launch to interactivity benchmarks because the difference is so stark relative to the hardware performance that memory management isn't nearly enough to explain it, but it does contribute to it. (My own guess is that most of the performance difference comes from smarter process initialization from usage data. Apple is notoriously bad at using data for optimization.)
Apple has decades of proven experience producing and shipping massively over engineered systems. I believe em when they say these processors do ARC natively.
Hopefully I can clear up the discussion a little:
Q: Does reference counting 'use' less RAM than GC?
A: Yes (caveats etc. go here, but your question is a good explanation)
Q: Does the M1 in and of itself require less RAM than x86 processors?
Q: So why are people talking about the M1 and its RAM usage as if it's better than with x86?
A: It's really just around the faster reference counting. MacOS was already pretty efficient with RAM.
I'd like to propose tokamak-teapot's formula for hardware purchase:
Minimum RAM believed to be required = actual amount of RAM required * 2
N.B. I am aware that a sum that's greater than 16GB doesn't magically become less than 16GB, but it is somewhat surprising how well MacOS performs when it feels like RAM should be tight, so I'd suggest borrowing a Mac or making a Hackintosh to experience this if you're anxious about hitting the ceiling.
Then they pivoted into automating retain/release patterns from Cocoa and sold it, Apple style, as a victory of RC over tracing GC, while moving the GC related docs and C related workarounds into the documentation archive.
Operative word: tried. GC was an optional desktop-only component deprecated in Mountain Lion, which IIRC has not been accepted on MAS since 2015 was removed entirely from Sierra.
Without going into irrelevant weeds, "apple has always used refcounting everywhere" is a much closer approximation.
Which then in Apple style ("you are holding it wrong") turned it around in a huge marketing message, while hiding away the tracing GC efforts.
That's not exactly relevant to the subject at hand of what memory-management method software usually uses on macos.
> Which then in Apple style ("you are holding it wrong") turned it around in a huge marketing message, while hiding away the tracing GC efforts.
And people are looking to refcounting as a reason why apple software is relatively light on memory, which is completely fair and true and e.g. generally assumed as one of the reasons why ios devices fare well with significantly less ram than equivalent android devices. GCs have significant advantages, but memory overhead is absolutely one of the drawbacks.
And the fact that M1 has special instructions dedicated to optimize RC,
Memory overhead in languages with tracing GC (RC is a GC algorithm) only happens in languages like Java without support for value types.
If the language supports value types, e.g. D, and there is still memory overhead versus RC, then fire the developers or they better learn to use the language features available on their plate.
This shows latency, not memory consumption, as far as I can tell.
> If the language supports value types, e.g. D, and there is still memory overhead versus RC, then fire the developers or they better learn to use the language features available on their plate.
Memory overhead of certain types of garbage collectors (notably generational ones) is well-known and it's specified relative to the size of the heap that they manage. Using value types is of course a valid point, regarding how you should use the language, but it doesn't change the overhead of the GC, it just keeps the heap it manages smaller. If the overhead was counted against the total memory use of a program, then we wouldn't be talking about the overhead of the garbage collector, but more about how much the garbage collector is actually used. Note that I'm not arguing against tracing GCs, only trying to keep it factual.
But I think in general you could say that Apple has focused more on optimizing their OS for memory usage than the competition may have done. Android uses Java which eats memory like crazy and I suspect C# is not that much better being a managed and garbage collected language. Not sure how much .NET stuff is used on Windows, but I suspect a lot.
macOS in contrast is really dominated by Objective-C and Swift which does not use these memory hungry garbage collection schemes, nor require JIT compilation which also eats memory.
C# is better than JVM in that it has custom value types.
Say you want to allocate an array of points in Java you basically have to allocate array[pointer] all pointing to tiny 8 byte objects (for eg. 32 bit float x and y coords) + the overhead of object header. If you use C# and structs it just allocates a flat array of floats with zero overhead.
Not only do you pointlessly use memory, you have indirection lookup costs, potential cache misses, more objects for GC to traverse, etc. etc.
JVM really sucks at this kind of stuff and so much of GUI programming is passing around small structs like that for rendering.
FWIW I think they are working on some proposal to add value types to JVM but that probably won't reach Android ever.
I can attest that structs use less memory however IIRC they don't have methods so no GetHashCode() which made them way too slow to insert in a HashSet or Dictionary.
In the end I used regular objects in a Dictionary. RAM usage was a bit higher than structs (not unbearably so) but speed improvement was massive.
You can and should implement IEquatable on a struct, especially if you plan on placing them in a hashset - the default implementation will use reflection and will be slow but it's easy to override.
Given that the M1 chip was designed to better support reference counting, it makes sense that doing the same for HC could lead to a benefit
It looks like my blog post was the primary source for this (it's referenced both by this post and by the Gruber post), and to be clear, I did not claim that this helps ARM Macs use less RAM than Intel Macs. I think John misunderstood that part and now it has blown up a bit...
I did claim that this helps Macs and iPhones use less RAM than most non-Apple systems, as part of Apple's general obsessiveness about memory consumption (really, really obsessive!). This part of the puzzle is how to get greater convenience for heap allocation.
Most of the industry has settled on tracing GCs, and they do really well in microbenchmarks. However, they need a lot of extra RAM to be competitive on a system level (see references in the blog post). OTOH, RC trends to be more frugal and predictable, but its Achilles heel, in addition to cyclic references, has always been the high cost of, well, managing all those counts all the time, particularly in a multithreaded environment where you have to do this atomically. Turns out, Apple has made uncontented atomic access about as fast as a non-atomic memory access on M1.
This doesn't use less RAM, it decreases the performance cost of using the more frugal RC. As far as I can tell, the "magic" of the whole package comes down to a lot of these little interconnecting pieces, your classic engineering tradeoffs, which have non-obvious consequences over there and then let you do this other thing over here, that compensates for the problem you caused in this other place, but got a lot out etc. Overall, I'd say a focus on memory and power.
So they didn't add special hardware for NSObject, but they did add special hardware that also tremendously helps NSObjet reference counting. And apparently they also added a special branch predictor for objc_msgSend(). 8-). Hey, 16 billion transistors, what's a branch predictor or two among friends.. ¯\_(ツ)_/¯
RAM capacity is just RAM capacity. Possibly Swift-made apps uses less RAM compared to other apps, but microarchitecture shouldn't be matter.
My guess it's mostly faster swapping.
Microarchitecture could help, perhaps by making context switches faster.
But it could also be custom peripheral/DMA logic for handling swapping between RAM and NVM.
I think it makes sense.. NVM should be fast enough that RAM only needs to act as more of a cache. But existing architectures have a lot of legacy of treating NVM like just a hard drive. Intel is also working on this with its Optane related architecture work.
You could also do on-the-fly compression of some kinds of data to/from RAM. But I havent heard any clues that M1 is doing that, and you'd need applications to give hints about what data is compressible.
Most experiment with 8GB M1 Macs I've seen so far (on YouTube) seems to start slowing down once the data cannot fit in a RAM, although the rest of the system remain responsive e.g. 8K RED RAW editing test. In the same test with 4K RED RAW there were some shuttering on the first playback but subsequent playback were smooth, which I guessed it was a result of swap being moved back into a RAM.
My guess would be they've done a lot of optimization on swap, making swapping less of an performance penalty (as ridiculous as it sounds, I guess they could even use Neural Engine to determine what should be put back into RAM at any given moment to maximize responsiveness.)
macOS has been doing memory compression since Mavericks using WKdm algorithm, but they also support Hybrid Mode on ARM/ARM64 using both WKdm and a variant of LZ4 for quite some time (WKdm compress much faster than LZ4). I wouldn't be too surprised if M1 has some optimization for LZ4. So far I haven't seen anybody tested it.
It might be interesting to test M1 Macs with vm_compressor=2 (compression with no swap) or vm_compressor=8 (no compression, no swap) and see how it runs. I'm not sure if there's a way to change bootargs on M1 Macs, though.
combined it should make a big difference to swapping
But the much faster SSD to RAM transfer for the M1 means that shuffling stuff in and out of RAM is much faster meaning RAM matters less.
The transfer speeds of the M1 SSDs have been benched at 2.7GB/s - about the same speed as mid-range NVMe SSDs (my ADATA SX8200 Pro and Sabrent Rocket are both faster and go for about $120/TB).
I expected SSD is way fast, but benchmark says its SSD is about below 3000MB/s RW, not very fast but usual Gen3 x4 speed.
That said I’d still be very hesitant buying a 8 GB M1 Mac. When my iMac only had 8 GB it was a real pain to use. Increasing my iMac’s memory to 24 GB made it usable.
Intellij has a ton of features but it's pretty heavyweight because of it; I'd like a properly built native IDE with user experience speed at the forefront. That's what I loved about Sublime Text. But I also like an IDE for all the help it gives me, things that the alternative editors don't do yet.
I've used VS Code for a while as well, it's faster than Atom at least but it's still web technology which feels like a TON of overhead that no amount of hardware can compensate for.
I've heard of Nova, but apparently I installed it and forgot to actually work with the trial so I have no clue how well it works. I also doubt it works for my particular needs, which intellij doesn't have a problem with (old codebase with >10K LOC files of php 5.2 code and tons of bad JS, new codebase with Go and Typescript / React).
The native Swift retain (swift_retain above) seems to be somewhere inside this mess: https://github.com/apple/swift/blob/main/stdlib/public/runti...
The short answer for why it can’t just be an increment is because the reference count is stored in a subset of the bits of the isa pointer, and when the reference count grows too large it has to overflow into a separate sidetable. So it does separate load and CAS operations in order to implement this overflow behavior.
But they learned the lesson from SOAR (Smalltalk On A RISC) and did not follow the example of LISP machines, Smalltalk machines, Java machines, Rekursiv, etc. and build specialized hardware and instructions. The benefits of the specialization are much, much less than expected, and the costs of being custom very high.
Instead, they gave the machine large caches and tweaked a few general purpose features so they work particularly well for their preferred workloads.
I wonder if they made trap on overflow after arithmetic fast.
This is two operations and in-between the two -- if and only if the respective memory location is shared between multiple cores or caches -- some form of synchronization must occur (like locking a bank account so you can't double draft on two ATMs simultaneously).
Now the way this is implemented varies a bit.
Apple controls most of the hardware ecosystem, programming languages, binary interface, and so on meaning there is opportunity for them to either implement or supplement ARM synchronization or atomicity primitives with their own optimizations.
There is nothing really preventing Intel from improving here as well -- it is just a easier on ARM because the ISA has different assumptions baked in, and Apple controls everything up the stream, such as the compiler implementations.
1. "weak" memory ordering (atomic Aquire/Release loads/stores)
2. low memory latencies between cache and system memory (so dirty pages in caches are faster updated etc.)
3. potential a coherency impl. optimized for this kind of atomic access (purely speculative: e.g. maybe sometimes updating less then a page in a cache when detecting certain atomic operations which changed a page or maybe wrt. the window marked for exclusive access in context of ll/sc-operations and similar)
Given that it's common for a object to stay in the same thread I'm not sure how much 2. matters for this point (but it does matters for general perf.). But I guess there is a lot in 3. where especially with low latency ram you might be able to improve performance for this cases.
I roughly understand how refcounting causes extra damage to cache coherency: anywhere that a refcount increment is required, you mutate the count on an object before you use it, and then decrement it later. Often times, those counting operations are temporally distant from the time that you access the object contents.
I do not really understand the "pipeline bubbles" part, and am curious if someone can elaborate.
Reading on in the wiki page, they talk about weak references (completely different than weak memory ordering referenced above). This reminds me that Cocoa has been making ever more liberal use of weak references over the years, and a lot of iOS code I see overuses them, particularly in blocks. I last looked at the objc implementation years ago, but it was some thread safe LLVM hash map split 8 or 16 ways to reduce lock contention. My takeaway was roughly, "wow that looks expensive". So while weak refs are supposed to be used judiciously, and might only represent 1% or less of all refs, they might each cost over 100x, and then I could imagine all of your points could be significant contributors.
In other words, weak references widen the scope of this guessing game from just "what chip changes improve refcounting" to "what chip changes improve parallelized, thread safe hash maps."
For the last few decades the industry has generally believed that GC lets code run faster, although it has drawbacks in terms of being wasteful with memory and unsuitable for hard-realtime code. Refcounting has been thought inferior, although it hasn't stopped the Python folks and others from being successful with it. It sounds like Apple uses refcounting as well and has found a way to improve refcounting speed, which usually means some sort of specific silicon improvement.
I'd speculate that moving system memory on-chip wasn't just for fewer chips, but also for decreasing memory latency. Decreasing memory latency by having a cpu cache is good, but making all of ram have less latency is arguably better. They may have solved refcounting hot spots by lowering latency for all of ram.
From Apple's site:
"M1 also features our unified memory architecture, or UMA. M1 unifies its high-bandwidth, low-latency memory into a single pool within a custom package. As a result, all of the technologies in the SoC can access the same data without copying it between multiple pools of memory." That is paired with a diagram that shows the cache hanging off the fabric, not the CPU.
That says to me that, similar to how traditionally the cpu and graphics card could access main memory, now they have turned the cache from a cpu-only resource into a shared resource just like main memory. I wonder if the GPU can now update refcounts directly in the cache? Is that a thing that would be useful?
The M1 Mac's seem to be 8 channels x 16 bits, which is the same bandwidth as a desktop (although running the ram at 4266 MHz is much higher than usual). The big win is you can have 8 cache misses in flight instead of 2. With 8 cores, 16 GPU cores, and 16 ML cores I suspect the M1 has more in flight cache misses than most.
The DDR4 bus is 64-bit, how can you have a 128-bit channel??
Single channel DDR4 is still 64-bit, it's only using half of the bandwidth the CPU supports. This is why everyone is perpetually angry at laptop makers that leave an unfilled SODIMM slot or (much worse) use soldered RAM in single-channel.
> The big win is you can have 8 cache misses in flight instead of 2
Only if your cache line is that small (16 bit) I think? Which might have downsides of its own.
Less familiar with the normal on laptops, but most desktop chips from AMD and Intel have two 64 bit channels.
> Which might have downsides of its own.
Typically for each channel you send an address, (a row and column actually), wait for the dram latency, and then get a burst of transfers (one per bus cycle) of the result. So for a 16 bit wide channel @ 3.2 Ghz with a 128 byte cache line you get 64 transfers, one ever 0.3125 ns for a total of 20ns.
Each channel operates independently, so multiple channels can each have a cache miss in flight. Otherwise nobody would bother with independent channels and just stripe them all together.
Here's a graph of cache line throughput vs number of threads.
So with 1,2 you see an increase in throughput, the multiple channels are helping. 4 threads is the same as two, maybe the L2 cache has a bottleneck. But 8 threads is clearly better than 4.
Yeah, I'm saying you can't magically unify them into a single 128-bit one. If you only use a single channel, the other one is unused.
I've seen similar on Intel servers, but not recently. This isn't however typically something you can do at runtime, just boottime, at least as far as I've seen.
But doesn't that only help if you have parallel threads doing independent 16 bit requests? If you're accessing a 64 bit value, wouldn't it still need to occupy four channels?
So striping a caching line across multiple channels goes increase bandwidth, but not by much. If the dram latency is 70ns (not uncommon) and your memory is running at 3.2 GHz on a single 64 bit wide channel you get 128 bytes in 16 transfers. 16 transfers at 3.2GHz = 5ns. So you get a cache line back in 75ns. With 2 64 bit channels you can get 2 cache lines per 75ns.
So now with a 128 bit wide channel (twice the bandwidth) you wait 70ns then get 8 transfers @ 3.2GHz = 2.5ns. So you get a cache line back in 72.5ns. Clearly not a big difference.
So the question becomes for a complicated OS with a ton of cores do you want one cacheline per 72.5ns (the stripped config) or two cachlines per 75ns (the non-stripped config).
In the 16 bit 8 channel (assuming the same bus speed and latency) you get 8 cacheline per 90ns. However not sure what magic apple has but I'm seeing very low memory latencies on the M1, on the order of 33ns! With all cores busy I'm seeing cacheline througput of a cacheline per 11ns or so.
sounds realy cool
I'm not familiar with MacOs, are the apps there mostly managed code? Even if they were and even if refcounting on Mac is that much faster than refounting on PC - refcounted code would still lose to manual memory management on average.
So low latency of the cache to system RAM can help here, at least for cases where the Rc is shared between threads. But also if the thread is not shared between threads but the thread is moved to a different CPU. Still it's probably not the main reason.
Given how atomic (might) be implemented on ARM and that the cach and memory is on the same chip my main guess is that they did some optimizations in the coherency protocol/implementation (which keeps the memory between caches and the system memory/RAM coherent). I believe there is a bit of potential to optimize for RC, i.e. to make that usage pattern of atomics fast. Lastly they probably take special care that the atomic related instructions used by Rc are implemented as efficient as possible (mostly fetch_add/fetch_sub).
> which keeps the memory between caches and the system memory/RAM coherent
Isn't this already true of every multi-core chip ever designed; the whole point of coherency is to keep the RAM/memory coherent between all the cores and their caches.
> Isn't this already true of every multi-core chip ever designed;
Yes, I just added the explanation of what coherency is in this context as I'm not sure how common the knowledge about it is.
The thing is there are many ways how you can implement this (and related things) with a number of parameters involved which probably can be tuned to optimize for typical RC's usage of atomic operations. (Edit: Just to be clear there are constraints on the implementation imposed by it being ARM compatible.)
A related example (Not directly atomic fetch add/sub and not directly coherency either) would be the way LL/SC operations are implemented. Mainly on ARM you have a parameter of how large the memory region "marked for exclusive access" (by an LL-load operation) is. This can have mayor performance implications as it directly affects how likely a conditional store fails because of accidental inference.
I'm curious to understand this. Is this because of a specific instruction set support, or just the overall unified memory architecture?
For the price, it better run circles and squares. It should cook my dinner too.
Look, want to know how M1 achieve its result? Easy. Apple is first with a 5nm chips. Look in the past: every CPU maker gains both speed and power efficiency when going down a manufacturing node.
Intel CPU were still using a 14nm node (although they called 12+++) while Apple M1 is now at 5nm. According to this  chart, that's a transistor density at least 4x.
Not saying Apple has no CPU design chops, They've been at it for their phones for quite a while. But people are just ignoring the elephant in the room: Apple gives TSMC a pile of cash to be exclusive for mass production on their latest 5nm tech.
A reference counting strategy would be more efficient in processor utilization compared to garbage collection as it does not need to perform processor intensive sweeps through memory identifying unreferenced objects. So reference counting trades memory for processor cycles.
It is not true that garbage collection requires more ram to achieve equivalent performance. It is in fact the opposite. For programs with identical object allocations, a GC based system would require less memory, but would burn more CPU cycles.
I think it’s the reverse.
Firstly, garbage collection (GC) doesn’t identify unreferenced objects, it identifies referenced objects (GC doesn’t collect garbage). That’s not just phrasing things differently, as it means that the amount of garbage isn’t a big factor in the time spent in garbage collection. That’s what makes GC (relatively) competitive, execution-time wise. However, it isn’t competitive in memory usage. There, consensus is that you need more memory for the same performance (https://people.cs.umass.edu/~emery/pubs/gcvsmalloc.pdf: with five times as much memory, an Appel-style generational collector with a non-copying mature space matches the performance of reachability-based explicit memory management. With only three times as much memory, the collector runs on average 17% slower than explicit memory management)
(That also explains why iPhones can do with so much less memory than phones running Android)
Secondly, the textbook implementation of reference counting (RC) in a multi-processor system is inefficient because modifying reference counts requires expensive atomic instructions.
Swift programs spend about 40% of their time modifying reference counts (http://iacoma.cs.uiuc.edu/iacoma-papers/pact18.pdf)
So, reference counting gets better memory usage at the price of more atomic operations = less speed.
That last PDF describes a technique that doubles the speed of RC operations, decreasing that overhead to about 20-25%.
It wouldn’t surprise me if these new ARM macs use a similar technique to speed up RC operations.
It might also help that the memory model of ARM is weaker than that of x64, but I’m not sure that’s much of an advantage for keeping reference counts in sync across cores.
True, reference counting stores references… but garbage collection stores garbage, which is typically bigger than references :)
(Unless you’re thinking of a language where the GC gets run after every instruction - but I’m not aware of any that do that, all the ones I know of run periodically which gives garbage time to build up)
I've seen this with java where the memory usage graph looks like a sawtooth, with 100s of MB being allocated and then freed up a couple of seconds later.
In the tracing GCs I have seen, an "object header" must be stored with every object in the system; the GC needs it to know which parts of the object are references which should be traced. So while reference counting needs extra space to store the reference count, tracing GC needs extra space to store the object header.