Hacker News new | past | comments | ask | show | jobs | submit login
How does macOS manage virtual cores on Apple Silicon? (eclecticlight.co)
237 points by todsacerdoti on Oct 23, 2023 | hide | past | favorite | 168 comments



It's interesting to see AMD's take on big-little. The only difference between their Zen 4 core (big) and Zen 4c (little) cores is the speed and thermal characteristics. Owing to the same architecture (RTL) being implemented by both, with only the layout and cell density differing.

https://youtu.be/h80TB8K-Rfo?t=260


I like this approach (especially because it supports AVX-512), however even not counting the Zen 4c, AMD 3D v-cache CPU's have cores that are sufficiently heterogeneous (those with and without 3D cache) that some special scheduling is desired for some use cases (gaming on Windows)


Great video! Thanks for the link.

By the way, I was confused by how you equated architecture and RTL. In this context I use "architecture" to mean ISA. I use "microarchitecture" to refer to how e.g. its performance characteristics are realized, like IPC.

You seem to know more than I do and would love you to comment. Thanks!


Is that meaningfully different from Apple’s take?


I don’t think so. But it is different from some Intel CPUs where the efficiency cores didn’t support all the vector ops the performance cores did, causing issues with some software.


Was the industry ready for this concept of a computer having a number of meaningfully different kinds of cores? Has this happened before? Or did application developers just get cores as an integer count and that was it?


> Was the industry ready for this concept of a computer having a number of meaningfully different kinds of cores?

The industry didn’t have a choice. The market was demanding higher performance within the same thermal envelope and the same energy consumption. These are mobile devices, you can’t put a bigger heat sink on it and then crank up the power.

You can find tons of academic literature discussing the necessity of this development (along with many other things that have come to pass), how it would work, etc. in the decade leading up to the introduction. ARM didn’t just release it to the world and say “Surprise!”

We knew it was coming, we just didn’t do the best job of preparing for it.


Why not just run some of the high performance' cores at a much lower clock rate?

Much smaller silicon and software changes would have been needed to allow for just two different clock rates.

What was the argument for designing two different types of cores instead?


High performance cores use a lot of transistors and take up a lot of space. When you are aiming for high performance, this is a good tradeoff.

The efficiency cores are physically smaller, cheaper, etc. One you reduce the performance expectations to the point where they satisfy the requirements, they are a much better choice. The efficiency core has more perf-per-watt than a clocked-down performance core.

It seems counterintuitive. But when your budget for transistors is X and your budget for power draw/heat dissipation is Y, and there is no leeway whatsoever, the big.LITTLE concept gets you more aggregate performance.


Can you link an analysis for that where small cores have more perf-per-watt then a clocked down big core?

From what I understand the transistors aren't mostly for enabling the higher clock speed at all, they allow for wider cores that do more per cycle. It doesn't seem clear at all that they would be less efficient than a narrower design that does less per cycle, if clocked much lower to yield the same final performance per cycle.


AMD’s approach is similar to this. The efficiency core is the same core with less cache and lower clocks. It’s not a different micro architecture like in ARM and Intel designs.


It's been around since 2011 on Android. Nothing new. https://en.wikipedia.org/wiki/ARM_big.LITTLE


General term is asymmetric multiprocessing, which goes back a few more decades:

https://en.wikipedia.org/wiki/Asymmetric_multiprocessing

IIRC big.LITTLE implementations tended to have cores that didn't support the same instruction sets, meaning you couldn't migrate tasks between them if you needed to. Kind of like how laptops could switch between integrated and discrete GPUs, but some users would need to switch to the discrete GPU to use an external monitor even if they didn't want the power hit.

Also, "big.LITTLE" is a pretty strange brand name.


I think the term you're going for here is heterogeneous multi-processing/computing, not asymmetric multiprocessing:

https://en.wikipedia.org/wiki/ARM_big.LITTLE#Heterogeneous_m...

https://en.wikipedia.org/wiki/Heterogeneous_computing

It even lists big.LITTLE there as a typical example in the second article. big.LITTLE itself never had different ISAs as far as I could tell, just scheduling caveats that lead to efficiency tradeoffs, like the first article mentions.


My understanding of "heterogenous computing" is that it's more about splitting a task across a CPU and coprocessors, or writing the same code to target both. Asymmetric means there's multiple CPU cores but they're not equally performant.


big.LITTLE is typically cores with the same architectural features, just different performance due to different microarchitecture. A7/A15/A17, or A53/A57/A72, or A55/A76.

This let them run the same code, even the same system level code. The scheduler only had to optimize performance, without tasks being pinned to one core or another for correctness.


This is the case in Intel 12th gen: P cores support AVX512, E cores do not. Instead of allowing OS-level support, you could gain back AVX512 by disabling E cores. Intel ended up disabling that feature in hardware later though.


This was honestly a pretty surprising mistake for a big player like Intel. Quite a big oversight in my opinion. The fact that the OS will migrate processes from big cores to little cores should not be a surprise, given that it is basically necessary for the power savings to be realized. That effectively necessitates having the same ISA on both the big and little cores. It's not like desktop class operating systems haven't been running on these types of cores for over a decade.


Not to be confused with Little Big - https://en.wikipedia.org/wiki/Little_Big_(band)


Not to be confused with LittleBigPlanet.


Yet nobody in the Windows/Mac world was ready (if you think they should have to care which maybe they shouldn't).


Nobody in the Windows/Mac world writing software should have to care about the change, or even know about it. That's the beauty of an OS scheduler, it is optimized to execute your instructions where it sees fit, and higher level programs don't need to know how it works, where it executes your instructions, or that it even exists. Sometimes abstraction and separation of concerns can be beautiful thing. I'd argue that most programs shouldn't make any decisions about where to run, I know I don't want random engineers for some mac app controlling my power consumption or setting priority above anything else I run. The kernel developers with the data stream of health and performance info from the intimately connected hardware (on apple) is who I want to make the call on where to run. In the very few situations where E or P core assignment matters (games and VMs are all that I can think of and even that is arguable imo), there has been the ability to inform the OS scheduler where to run for a very long time.

Tangent: With today's insanely powerful hardware you should not ever be constrained in your programs to ever have to consider setting core affinity and if you go down that road you might want to reevaluate what you're doing, because you're probably doing something wrong and blaming it on the OS scheduler. Even constrained to the E cores (which are still plenty fast), your programs should perform well. I think more developers need to start writing software on slow machines on purpose because too many apps are written on machines that cost $4000+ with the newest chip and gpu innovations and then are never tested on slower more commonly used hardware, and things end up being dog slow on those machines and get no attention. If more apps were built on slow hardware and were still fast, they'd be even more so on the $4k machines. The macbook air was great for this because it was fanless and every core was essentially an "E" core, forcing you to optimize the code you wrote for the selfish reason of it not being annoying to run the code you were writing. Even if selfish, the net result was production code that was blazing fast once deployed on server grade hardware.


Abstractions leak. Many devs don't need to care, but anyone particularly concerned with performance or power consumption will want to know the details. (Just like many devs don't need to know about CPU cache, how GPUs work, etc.)


If instructions don't run the same between heterogenous cores, i.e. one core supports instructions the other doesn't, unless this information is available to the scheduler, there's no way it can make that decision. AVX512 was such an example, and sadly it ended up getting locked off from the P cores.


Something made the decision if it got "locked off from the P cores", what was it if not the OS scheduler?


What got locked off was the P-core's AVX512 instruction support, not some process's core affinity. Intel didn't have an E-core design ready to support AVX512 in Alder Lake. They initially allowed AVX512 to be enabled on the P-cores if all the E-cores were disabled at boot, but they later switched to fusing off AVX512 support permanently.


Interesting, but what does this have to do with what my original comment was about?


> today's insanely powerful hardware you should not ever be constrained in your programs to ever have to consider setting core affinity

That is wrong in HPC environment.

Most programs tailored for HPC will set core affinity manually. And they have very good reason to do so (cache affinity, memory bindings).

The correct abstraction depend of your domain. There were indeed very little incentive up to know to play with core affinity and the scheduler in a classical desktop environment.

The story is different on systems with strong constraints on efficiency.


I don't want you to determine where my programs on my computer run though. I want my operating system, which has much more context about the hardware available, a very smart scheduler in place, and honors my user preferences for how to operate on AC or battery power, and takes into account my scheduling overrides I want via AppTamer or similar. You as the developer of the application I'm running shouldn't mess with what cores it runs on, you should just make sure your program is able to be used across all of them, and that's where your reach should end. That's where my reach should end if I am writing a program you will run. The execution of the process is up to the user running the process and the OS, not the person who wrote the program. Maybe that's not an opinion may people share with me though


Yep but that has nothing to do with HPC. Those are super computing applications and you're not running those on your laptop or desktop. In HPC scenarios the developer knows more than the operating system.


Ah, I missed that we were talking about super computers. I thought we were talking about consumer gear


Also applicable for smaller scale HPC applications running on a single workstation, like an M2 Ultra.


On Mac they have used the Grand Central Dispatch library since 2009. So if developers already used that, they were ready for M1 https://en.wikipedia.org/wiki/Grand_Central_Dispatch


GCD was actually not designed for this scenario and it's easy to mess up with it.

It was designed for a large number of equal cores (aka SMP), meaning people did dispatch_async to the default concurrent queues all over the place, which is a bad pattern when you have to shrink down to phone size. Also, dispatch_semaphore has priority inversions and a lot of other features (like dispatch_group) are semaphores in disguise.

It does work if you use it carefully, but Swift concurrency is a different design for a reason.


According to Apple developer documentation GCD is currently designed for this scenario

https://developer.apple.com/documentation/apple-silicon/tuni...

"On a Mac with Apple silicon, GCD takes into account the differences in core types, distributing tasks to the appropriate core type to get the needed performance or efficiency."


It is now, but that's mostly from evolution in the iOS era. But like I said, it's not perfect.


that's probably assuming you set the qos correctly.


And dispatch_async loses QoS while dispatch_sync keeps it, but people often used to async unnecessarily.


>dispatch_async loses QoS

Wait really? Where can I read more about this, this goes against what I would assume.

Edit: Or do you mean that when dispatch_async the block is run with the QoS of target queue instead of source? That is what I would normally expect, if you want to "inherit priority" then dispatch_sync would do that, at the expense of blocking.


Android likes to do things to check off boxes and in mediocre ways


Heterogeneous computing is as old as computer science itself. There are well-known algorithms for dealing with this.


OTOH I'd question whether we as an industry are up to using it effectively, given that we couldn't even handle symmetric multiprocessing very well:)


iPhones and iOS have had this for quite awhile. My iOS dev knowledge is rather dated now but IIRC Grand Central Dispatch let you indicate the type of workload your task needed and thus which core type it was typically scheduled on.


GCD first appear on OSX 10.5 or 10.6 as i remember.


GCD first appeared on 10.6 Snow Leopard, which was marketed as a bug-fix only release and is due to this romanticised until today, but in reality included major changes under the hood and wasn’t very stable in his first versions.


Game consoles had different kinds of cores/processors.


I mean ever since very early on the GPU was a "separate CPU with different kinds of cores" compared to the main CPU.


I don't think macOS allows a userspace program to set thread affinities.

Not only are the functions commented out, there's this note

  THREAD_AFFINITY_POLICY:

  This policy is experimental.
  This may be used to express affinity relationships
  between threads in the task. Threads with the same affinity tag will
  be scheduled to share an L2 cache if possible. That is, affinity tags
  are a hint to the scheduler for thread placement.
  
  The namespace of affinity tags is generally local to one task. However,
  a child task created after the assignment of affinity tags by its parent
  will share that namespace. In particular, a family of forked processes
  may be created with a shared affinity namespace.
https://github.com/apple-oss-distributions/xnu/blob/main/osf...


Here's one example of userspace programs assigning affinity to specific cores

> Use it to automatically run busy background apps on the M1 or M2's efficiency cores to save power, leaving the performance cores for the apps you want to run fastest.

https://www.stclairsoft.com/AppTamer/


Unfortunately you can't do the reverse, pin to p core


You can in a way, if you tie up the E cores by pinning work to them exclusively, the scheduler will send other stuff to the P cores due to resource constraint avoidance attempts. It's not power efficient to sit and wait for an efficient core when you can buzz it through a P core super quick and get back to idle, I would imagine haha


Big.Little is a good architectural innovation, although I don't think MacOS and Android have been making good use of it. It is tricky hw feature to manage in software overall. Virtualization adds even more complexity.


Do you have a reason for why macOS doesn't make good use of big.Little?

As far as I know, it allocates background/low priority tasks to little cores such as OS housekeeping tasks. When you're on low powered mode or low battery mode, it automatically switches to little cores to save energy. When you need extra multithread power, it uses all cores.


Sort of related, are there any posix API extensions planned to allow a program to set thread hints in terms of whether it should run on e core or p core? Or maybe even a scale like 0-255?


Sounds like this explains why Docker eats thru Mac battery so fast


Checkout App Tamer[1] to control the priority of processes as well as core assignment on the M series chips:

> App Tamer can take special advantage of Apple Silicon powered Macs, which have two different types of processor cores. Use it to automatically run busy background apps on the M1 or M2's efficiency cores to save power, leaving the performance cores for the apps you want to run fastest.

1. https://www.stclairsoft.com/AppTamer/


I get 6h or so running quite a few workloads in Kubernetes on Docker For Mac on Apple Silicon. Intel was < 2h. The rest of my team has similar experiences. Are you sure it's Docker causing your battery issues?


Docker is definitely a problem, although newer versions have implemented new functionality that may have significantly reduced this. I can't really tell however, as I'm stuck on too many Zoom calls all day long and Zoom absolutely murders my battery life.


I haven’t tested it personally but you may be able to get some of that battery life back by using the web version of Zoom in Safari. That tends to force video chat services to use hardware accelerated video codecs which are considerably easier on battery.


And that's not just for Mac's, the same can help on Linux devices. It's somewhat hilarious that running a non-native app gets you better hardware accel but here we are.


I don't know about zoom specifically, but aren't most web conferencing "native" apps just electron anyway?


Indeed they tend to be however the underlying electron base tends, imo experience, to be outdated and comparatively hard to configure


Maybe the web browser is the OS now. Does that make the OS just a hardware abstraction layer?


I think that's generally accepted as true for some classification of people.

That's why chromebooks are an idea that was perhaps too early (and hardware too anaemic)


Try orbstack


Orbstack still seems to rely on docker's virtualization. We need something better than docker to improve on this.


Orbstack runs Docker inside its own lightweight VM. It doesn't use any of Docker Desktop's virtualization. I'm not sure if it handles scheduling any better though.


Does it make use of the E cores on Apple Silicon?


(dev here) Yes. I've spent quite a bit of time on scheduling. It's more nuanced than what this article says and there are ways to influence it (but sorry, can't share too many details). OrbStack tends to use E cores more than other virtualization-based apps do on Apple Silicon, especially for heavy workloads.


Thank you! This is really interesting, will give OrbStack a test drive.


Since all docker implementations use a VM on Mac, and according to this article VMs only use P-Cores, I have to assume it only uses the P cores.


This article is based on an M1 Max but I think M2 Max behaves differently here since it has more E-cores. Don't quote me though.


> With the introduction of Game Mode in Sonoma, CPU scheduling can now work differently, with E cores being reserved for the use of games.

- shouldn't that be 'P cores'?


That was my first impression too, but it actually makes sense. According to a linked article by the same author:

> it’s apparent that during Game Mode, the game was given exclusive use of the two E cores, and threads from other processes fixed at low QoS, which would require them to be run on the E cores, were kept waiting. The game’s threads were run on a combination of E and P cores, with much of their load being concentrated on the E cores. This appears to be energy-efficient, and ideal for use on notebooks running on battery power.

So: the E cores are resevered for the use of games, but the game still makes use of some P cores.

https://eclecticlight.co/2023/10/18/how-game-mode-manages-cp...


I like the sound of this! I play a lot of older games on my MacBook Air and one of my biggest annoyances is when the game gets the machine to heat up (and in previous iterations spin up the fan) despite being a very graphically undemanding game.

I think there are a lot of otherwise graphically simple games that poll or sit there in busy wait loops or render more frames than the display can show. It’s like they’ve been built for a resource starved environment and don’t know how to properly handle abundance. It’s really frustrating!


I did notice that my MBA M1 running Baldur's Gate 3 heats up much less under Sonoma, which moreover makes the performance more consistent (it would heat up and throttle however the resolution/settings, which meant it was better to run in low power mode actually). I had no idea it was because of the OS update.


On my PC, I limit my FPS to 24 (via graphics card settings). It uses far less power, and runs considerably cooler.


This is bad, in a gaming setting, for several reasons.

You should at least make your frame limit be equal to your screen refresh rate.


Meh, gsync keeps it sane. But no, my screen refresh rate is 165hz. If it were to be running that many fps, it would result in ~200 watts of power be utilized. Maybe this is "good" for gaming, but it is def bad for my wallet who has to pay the power bill. Maybe when electricity prices go back down, I'll turn it up.


> Meh, gsync keeps it sane.

GSync isn't magic. It can only do so much. In this situation, where you've tied your frames to 24 (for some reason?) and your monitor refresh rate is 165Hz, GSync isn't going to save you. GSync will help you if there are sudden, brief drops in FPS below the refresh rate and keep things in sync. It will not save you if your refresh rate is 165Hz and your frame rate is 24...

> Maybe when electricity prices go back down, I'll turn it up

I don't think you'll notice the extra $5 a month it costs to run that a couple hours a week.

We should at least use a real reason why you'd do this. Saving a smidge of power is not one of them.

You should, instead, under-clock your monitor refresh rate to something low, like 75 or 60Hz, then frame-limit to that number as well.


Gsync/variable refresh rate should be able to make that work. Even if the display doesn’t go down to 24Hz, it only needs to drop to 144Hz to be an integer multiple of 24Hz - you’d just get each frame being displayed for 6 frames.


How high the electricity cost in your area for you to notice an extra 200 watts during gaming? Assuming $0.3/kwh, if you play 8 hours everyday (which is a lot), you'll increase your bill by $14.4. If you only play 2 hours everyday on average, the increase is only $3.6.


As I mentioned in another comment down-thread, my costs (after taxes + fees) is €0.60 a kWh.


Ok so $7 a month then. What you're doing to "save costs" is absurd, and the approach is entirely incorrect.

Downclock your monitor to something like 60Hz, then pin your frames there too.


I'm not sure how you reached $7...

0.2 kw * 8 hours * 30 days * €0.60 =

1.6 kw/h/d * 30 days * €0.60 =

48 kw/h * €0.60 =

€28.80 per month

I game about 3-6 hours a day, but I also do a lot of (unreal) programming on the same computer. So according to measurements, it draws about 2-3kw/h a day. Without limiting the framerate, it's closer to 10kw/h per day. We're talking a savings of ~€4 per day. That's over €100 per month in savings.

> Downclock your monitor to something like 60Hz, then pin your frames there too.

I'm not sure what my monitor rate has to do with fps. They are two totally independent measurements that are very rarely synced if you are doing any kinds of graphically intensive work, even if you pin them to the same rate. The monitor will still double-post frames every so often, or even skip a frame. Another good example is a paused video which is 0fps, but the monitor doesn't care. It just keeps showing the same frame over and over again. The same thing happens here, and with (g|v)sync, there's never any tearing.


Monitor refresh rate and FPS your GPU can generate are very related[1].

Again, your setup is so far from ideal, you should reconsider. Reduce your refresh rate to some multiple of 24 if you insist on 24 for some unknown reason.

It's one of those situations where you are being so clever you're hurting yourself and not realizing it.

[1] https://www.displayninja.com/what-is-screen-tearing/


I think you are lacking some basic fundamentals and/or not reading what I’m writing.

Gsync is enabled. There is no tearing. Monitor refresh rate and fps have literally nothing to do with each other when the fps is less than the refresh rate.


I has to be a divisor of the frame rate, not necessarily equal. Eg with 30fs in a 60fps screen you will do fine. But with a high refresh rate screen and gsync/freesync on top of that, most probably it does not matter at all because the refresh rate of the screen should adjust.


Outside of networked multiplayer games, I can’t think of any reason this would be “bad.”


Even slow paced games will experience tearing. Here, the parent post has inadvertently made their setup beyond non-ideal.

They picked a random 24 FPS target (maybe based off old movies for some reason?), but their monitor refreshes at 165Hz. The mis-match of frames and refreshes is so off, they will experience tearing even with mostly-static backgrounds.

Basically, nothing the parent poster is doing makes sense. Neither for power savings nor viewing pleasure.


> They picked a random 24 FPS target (maybe based off old movies for some reason?)

Not random, and not "old movies". Modern day movies are still 24fps, it's part of what makes movies look the way they do, motion blur.

> but their monitor refreshes at 165Hz

So? It refreshes the image 165 times a second, regardless of video framerate. It just so happens that many of those 165 refreshes in this case will be the same image over and over. Pause a video, so 0 fps, or 1 fps depending on your take, the monitor is still refreshing 165 times a second because it doesn't have anything to do with the video in this case.


The problem is the mis-alignment of frames being pushed into the buffer and the monitor drawing them.

You will experience tearing with this much of mis-alignment. It's why things like GSync/FreeSync even exist.

> motion blur

You can enable motion blur in most game settings...


I thought tearing occurred from trying to run display frame rates beyond the refresh rate of the monitor? A frame rate below the refresh rate should have ample opportunity to flush correctly, or I’m not understanding the tech in play maybe.

If the game is a rpg or any casual genre where timing is not important and lack of it doesn’t impact gameplay or competitiveness, what’s it matter if they run at 24fps? Maybe they run that low to screen capture at a lower frame rate to not needlessly burn cpu converting 60 or 160 fps down to 24? And most games cap at 30 or 60, yet higher refresh rate monitors don’t have tearing issues or everyone would be throwing a fit. My son has a 120hz monitor and plays countless games capped at 60. Is it the fact they all divide into the refresh rate that stops the issue?


https://www.displayninja.com/what-is-screen-tearing/

It happens in both directions.

Yes, staying at some multiple of your refresh rate is the general advice. So 60 would be ok on a 120Hz screen.

The OP's comment that started all this, is doing 24fps at 165hz, which is 6.875. They claim to be in the gaming world too, which makes their bizarre decision even more puzzling. Just don't do this.


Cargo culting is the best, which is what you are doing here. /s

Even if it is exactly a multiple or even the same number, without vsync, you’ll still get tearing. Why? Because fps is a dynamic number. It’s a measure of how many frames your card can generate in a second. By the time you measure, the frames are already output and gone from the buffer.

The refresh rate on the monitor is a constant, never changing value.

Even if you set the fps to 30, you’ll sometimes render 30.1 or 29.9 frames in a second due to random jitter from the geometry/shader complexity quickly changing on the screen.

There is no such thing as constant fps.


That article badly misunderstands what vsync is. All vsync does is tell the GPU to wait until the monitor is done displaying a frame before switching buffers. That’s it. Turning on vsync eliminates tearing no matter what frame rate or what form of multiple buffering is used.


Not tearing. It can still utilize vertical sync. If done naïvely the time between frames would jitter a bit though. At 165Hz it wouldn’t be too bad.


That's kinda like saying to save money you eat only plain rice, uncooked. Like yes it will technically work but most people would not be willing to subject themselves to that.


That sounds horrifying. I can't comfortably watch video at 24 fps, let alone play an interactive game. But to each his own, who am I to dictate your preferences?


I'm not the GP, but it makes sense to me. I can't even see a difference between 24 and 60 FPS. It has to get down into the teens before I can tell. I haven't capped my graphics like that, but that's more because it never occurred to me than because it would bother me.


Honest question, not intended to be disparaging: do you not see a difference between the various animations on [1], for example? To me, they are very apparent.

[1] https://www.testufo.com/


This website does things a poor justice and doesn't seem very accurate. For example, I tried this on my linux work computer (which doesn't offer vsync, or a graphics card) and compared it to my gaming pc. The non-vsync version looked much better than the vsynced version.

I went outside and took a 4k video at 24fps and played it back. It looks smooth as butter, just like a movie. Then I took the same video, but recorded at 60fps and re-encoded it in 24fps. It looks like the example on this website, where it looks jumpy.

I assume this is because of motion blur.


The bottom one looks different. Like I said, I can tell once the FPS gets into the teens. The top two look exactly the same to me. I realize they are different, but I can't see it.


https://www.testufo.com/framerates#count=4&background=stars&...

This is better, because we're talking about the lower range right now anyway.

That said, I think the difference between 144Hz and 72Hz is a lot more subtle and can't be seen if you try to glance at them directly. Try viewing them with peripheral vision and there's a difference in smoothness. That shows up in things like moving a window around (or even just moving a mouse). It makes a huge difference in input latency/response, but the difference can't be seen easily by just staring directly at a moving image.


Wow! I can easily see the difference between 120 Hz and 60 Hz. 30 Hz (bottom one) looks absolutely horrible to me. I guess there is a large gap between how various people perceive the world.


About 10 fps is my lower limit. I can’t tell the difference for anything higher than 18 fps.


How much money could it possibly save? It might be more worth it to sell your GPU & buy a cheaper one. You might even be able to run at a higher framerate with those savings!


10-20 Euro per month, just from gaming. From all things, it makes a difference of about 50-100 Euro a month.

20fps uses 70 watts, full speed is 200 watts. Times 5 hours, times 30 days, times €0.60.


Ah excellent. Thaks for correcting my erroneous correction attempt


Brings a whole new meaning to “idle” games!


what's the difference between a P-core and an E-core? special machinery for caching, branch prediction and speculative execution?

i still have the antiquated view that performance and efficiency are essentially the same metric and these are the only kinds of ideas i can come up with that would separate them..


A semi truck is a very efficient way to move a lot of packages across the country very quickly. It is a terribly inefficient way to move a single envelope.

Deeper speculation, larger out of order window, more accurate branch prediction, larger caches, more execution pipes, larger register files…


Speculative architecture is one thing but plain ol' superscalar architecture can make things run faster by simply executing more instructions per cycle, given you dedicated the die space to additional execution units to do more things at the same time.


E-cores are physically smaller and have fewer execution units and what they do have are more likely to be shared, I think.

So they simply can’t process as much clock for clock.


They also have smaller resources like reorder buffers so they can't speculatively keep nearly as much in progress work around, leading to more stalls.


a bit OT, but has anyone had the opportunity to compare how Vimy feels vs. UTM?


> QoS is a control set and changed in the code launching a process, and can only be altered by the user if the app exposes a control, a feature which remains exceptionally uncommon. Neither is there any easy way for a user to know the QoS of any given process unless the app reveals that.

That... sucks


Does it? I buy my MacBook to use it, not manage process performance. When I launch a game, I get a popup that tells me that it will be prioritized while it is fullscreen. And there's a menu icon to turn off "game mode". Ok, fine, but I don't really care and I doubt I'll ever touch the setting.

It's there to be exposed if the developer thinks it should. That seems like the right balance to me.


Since we’re talking the scheduler, has it gotten better at not ping-ponging processors around?

I seem to remember years ago I could open activity monitor and watch processes migrate back-and-forth between cores for seemingly no reason instead of just sticking in places.

I haven’t watched to see what happens recently. Is that still an issue?


Reason for it is thermals


How can you see which core a process in on from activity monitor? Only way I know of is via instruments?


I think it was back when I ran a utility called MenuMeters which no longer exists.

iStatMenus is a similar utility that still exists and is far more popular.


istat menus doesn't show you _which_ core a given process is running on though? It only shows you total core activity?


Aside from cosmetics, is it really an issue? And why?


It causes latency issues on SMP machines with multiple processors for sure; I don't know whether or not that same issue translates to a similar scheduling problem across the same die, although I imagine it would, the context switch and scheduler itself isn't costless.

gamers have been binding to single processors/cores for years for that reason, it's a huge cause of microstutter type behaviors -- but who knows if its perceivable with all the magic going on with processors nowadays.


Processes get interrupted and rescheduled between tens and hundreds of times per second (depending on OS, the scheduler may take over somewhere between 100 and 2000 Hz, although it doesn't necessarily deschedule every time). This is how SMP works at its most fundamental level. Every time that happens, it involves a full flush from the hardware perspective. It's always context switching, savings the registers, clearing the buffers, etc.

When a process gets descheduled and then later rescheduled, there is no particular reason to reschedule it on the same core it was on previously. Desktop class hardware doesn't have NUMA. Last level caches might have locality-dependent performance characteristics, but this is usually pretty nuanced, wildly variable between hardware microarchitectures, and not accounted for / not worth accounting for in the OS's scheduler, since there's a good chance that data won't be around next timeslice anyway.

Binding a processor to a single core affinity-wise is not changing any of this.

If you really have a throughput-bound process that needs to squeeze every single microsecond of compute and/or you're super latency bound for some reason, you schedule a process with what's called realtime priority, which changes the scheduling decision of WHEN to deschedule it. Not every operating system has equivalent mechanisms for this, but that's more in line with the phenomena that you're describing. But a process may well cause itself to be descheduled anyway by doing something that puts it to sleep. You have no control over this.

In reality, looking at which core a process is on is mostly the wrong way to be thinking. You're looking at this on a 1-2 second window (the amount of time it takes for Activity Monitor / Task Manager to update its UI). A full second of time is an absolute eternity to a process. Large amounts of work happen on the order of microseconds and quicker. And in that 1-2 second snapshot that Activity Monitor or Task Manager gave you, the process probably spent time on EVERY core -- and you're just seeing wherever it was in the brief moment when Activity Monitor / Task Manager updated its UI.

> gamers have been binding to single processors/cores for years for that reason, it's a huge cause of microstutter type behaviors

It wouldn't be the first time gamers prescribed placebo tweaks because of magical type thinking.


Desktops have mini-numa (core complex et al), and a scheduler might take this into account, but I think the cache lines are emptied anyhow on context switch, but otherwise multi threaded apps might be able to share some cache. Or it might not, and it will thrash the cache and be better in different complexes. Easy to contemplate opposite scenarios.


Depending on which cores it’s moving between it could be a performance issue if they don’t share a cache. I don’t know if it actually is or was.


I'm not a "hater" of Apple, I think they were just too expensive for me long time ago.

I will say since I use them for work, their ability to keep running despite opening so much crap is amazing. Browser tab wise. Not just limited to a specific silicon either as I use both intel/m2.


How so? I usually have 200-600 browser tabs open for weeks on Windows/ryzen 5000


Have you heard of bookmarks? And if you need a little bit of help with that, Tab Stash comes to the rescue https://addons.mozilla.org/en-US/firefox/addon/tab-stash/


From someone who sometimes becomes a tab hoarder... Bookmarks get lost, tabs are things you want to keep around until you are done with it. Every now and then I say "eff it" and clear all my tabs and start over, but most of the time I leave them up. It makes it easy to talk to friends about recent stories I just read if its still there in my tabs.


Bookmarks also have considerable management overhead if one wants to keep them usable, largely thanks to browser vendors having either phoned in the design of their bookmark managers or left them stuck in 1999. For some reason an Nth cosmetic toolbar redesign is more important than making bookmark management not terrible.


Exactly. You can't even search within just one bookmark folder in chrome or do fuzzy search. Bookmark manager UX is hot garbage.


Just use history. It has great search. Close your tabs, let go. It’s a history search away if you want it back.


https://news.ycombinator.com/item?id=37992980

Also no, history has terrible search. You can't search within pages, search within a single folder/window, etc.


This argument has never made any sense to me. Once you get past 30-40 tabs, all that appears on screen is just a series of icons, and repeating icons at that as most tab hoarders I've seen from screen sharing have multiple tabs of the same site open in random locations. Unless you have some sort of photographic memory, it's certainly much easier to bookmarks stuff and organize it than haphazardly scan through a long series of icons to find the right page.


Vertical/tree-style tabs are a must if you're a tab hoarder, I can't imagine living without them.


I save all of my open tabs into a folder.

But now there are built in ways to manage tab groups, which I haven’t used yet, but look neat.


That's where you make a browser extension to dump your tabs before closing all


I have ADHD. None of these work.


That's not a lot tbh. I'm usually at ~1500 tabs (13 windows across 3 win+tab desktops) per machine on both my desktop (64GB 5800x3d) and macbook (32GB M2 Pro).


Why on earth would you do this? Genuine question.


I don't go quite that excessive, but you can keep various projects and activities tracked pretty well. Instead of regretting closing a few stackoverflow pages (which will be a nightmare to search for even with local history) you can leave up the useful ones, so you don't lose them, and close it all once the current task / project is no longer needing them.

I used to make fun of people who hoarded them, but honestly sometimes just keeping that one tab open until you've exhausted its use is worth it.


I can't even imagine trying to locate that "useful tab" amongst 1500 open tabs. My bookmarks by contrast are nicely organized into folders. I can also open an entire folder of links at once, for example, my favorite blogs.


So typically what happens is I'll split out browser windows with tab groups, Firefox used to have a way of doing this from one window, but they scrapped it or its just not default anymore, it was like "Virtual Desktops" but for tabs. It was insanely useful, I think the tabs that were hidden were not even loaded in the background, so they were on standby until you can back to them.


I do something similar, because I can. Why waste time organizing & pruning them when I don't need to? I can just search for open tabs that I need, jump to windows to resume sessions I was working on earlier, etc. At some point it gets too much and I use OneTab to save all my open tabs and start over.


Each project has its own window. Ctrl+tab moves down the call stack and ctrl+shift+tab moves up the call stack. Exploring a topic often requires depth first search and a lot of ctrl+w and clicking "close tabs to the right" after researching subtopics. I've used this approach for over a decade and I've always been able to upgrade my machine fast enough to keep up.

Of course I also use ctrl+e (QuickTabs) to jump faster with intellij-like fuzzy search. It's kinda like Chrome's built-in ctrl+shift+a, except better.


Bragging rights.


Good lord. Now I'm hesitantly curious to know how many unread messages and unopened emails you have.


0 unread replies on the sites I use regularly (HN, reddit, twitter, messenger) and 0 unread emails in the Primary tab on gmail.

I used to be inbox-zero in all gmail tabs a couple of years ago and I will be yet again some day. The main limitation I'm running into is that a lot of useless emails ("We're making some changes to our PayPal legal agreements") have unpredictable subject lines and are sent from the same address that sends useful emails. I already have hundreds of gmail filters that auto-archive useless emails with predictable addresses or subject lines ("Your statement is ready") that I can't unsubscribe from without closing the account. LLMs should solve this problem some day.


Yeesh. I have a 32GB M2 Max and if I hit 15 (that's fifteen) tabs, I'll habitually quit the browser and restart it, if not the whole computer. Any apps in the background get force quit after the last window is closed.

Windows 98-brain is hard to get rid of...


My Windows 9x brain opens as many Netscape/Opera/MSIE windows as I want till 'it' crashes. Nowadays swapping heavily just slows down the machine.


It’s quite clearly a lot. Just not as much as you.


I meant that for a modern machine 1500 is nowhere close to the maximum amount you can have before TLB misses (and later page faults) slow down the entire machine. If a machine with similar specs can't handle that many tabs, something is misconfigured. Eg maybe memory timings are way too conservative or the wrong power profile is in use.


Remember a website can do anything it wants, and that includes leaking 32GB memory.

Browsers have been implementing background tab killing recently though, so just because you have a tab doesn't mean the process is actually backing it anymore.


I've seen the occasional memory leak from Youtube and Twitter, but only once a month. Not a big deal since I try to reboot for security updates every 3-6 months anyway.


1500 tabs easily touches enough RAM that average computers will keep swapping if that's actually your working set. Which is why any good browser should be completely freezing tabs that haven't been active in a while so their memory can remain swapped out, or even completely unloading them.

Though, 1500 tabs in one window is beyond the point at which it impacts Safari's responsiveness, but that's completely unrelated to the contents of the tabs and is instead the fault of taking way too long to update every NSView in the tab bar.


I have one specific TradingView tab that will crash the browser if I try to interact with it. Been that way for weeks.


Yeah my comment is more on memory management in this case 16GB range.

edit: I'm gonna get dinged for OT comment again ha


Weird, chrome freezes daily for me.


On what machine? As I mentioned up thread, if a machine with similar specs can't handle that many tabs, something is misconfigured. Eg maybe memory timings are way too conservative or the wrong power profile is in use.

If you open Resource Monitor before the freeze, you can usually see what happened on the graphs. If you're fast enough, you can also click "analyze wait chain" to see something that looks like a stack trace. The graphs and trace should make it pretty obvious why an app is frozen. Eg maybe it's trying to save the history sqlite database and your SSD ran out of SLC cache.


Seek help


I'm convinced the neurology of these people is setup in a way that they feel physically rewarded from combing through random hoarded stuff to find what they are looking for vs organizing things. It would make for an interesting scientific study as to why some users feel so compelled to keep so many tabs open. Is it FOMO, excellent recall (the only way I can see this being truly useful), or actual hoarding disorder gone digital???

As a person who has terrible short-term memory, I'm genuinely curious. I never hoard my tabs past a certain point 10-20 tabs, it actually slows me down in finding what I'm looking for. If I'm working on specific projects and need to save the tabs for later, I just use tab groups in Chrome and associate with a task so when I context switch, I just open up an existing tab group and close my other non-related ones.

https://www.google.com/chrome/tips/#:~:text=You%20can%20grou....


1) Why so judgmental?

2) As somebody with ADHD, there's an unacceptable level of mental overhead required for organizing things. I simply don't have any spare bandwidth to add organizing tabs into bookmarks and organizing bookmarks or whatever else into my daily functioning as a human being. It has nothing to do with feeling rewarded.


I don’t think it deserves that much armchair psychology. I was joking.


I read virtual cores on a Processor, got excited for a bit but realized it’s not the same as a soft core processor.

I hope FPGAs take off one day and software can compile hyper-optimized processors that do very specialized tasks.


FPGAs are pretty slow clockwise compared to a cpu, their main advantages is that they can work with many different kind of clocks and somewhat massively parallel. Do you have any specialized task you are thinking about?




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: