Hacker News new | past | comments | ask | show | jobs | submit login

The background for this really puzzles me. AMD said you should be seeing more improvement for zen 5 over zen 4. They go back and forth, AMD says run it as super root. Zen 5 gets faster. But zen 4 gets faster too. The improvement basically disappears again when the "fix" is applied universally. So why was AMD testing with two different setups?



The phoronix bench suite showed performance in line of expectations. Its specifically games on windows that regressed. I have no clue how windows scheduler works but that can have a huge impact in games specifically.


Compared to Zen4, Zen5 has much higher latency between cores on different chiplets. It's possible that a scheduler + app combo could regress on Zen5 for this reason. It sounds basically impossible for single-threaded apps to be impacted because timeslices are really big. Multithreaded apps where threads are communicating constantly could easily run slower if the scheduler sees all the cores as identical and interchangeable.

I don't know much about this topic, but it seems like Windows uses Processor Groups for scheduling[0], and generally tries to fit each NUMA node into 1 Processor Group (as long as it has at most 64 cores in it). Since the issue here is latency between chiplets and no NUMA is involved, all the cores go in the same Processor Group.

[0]: https://learn.microsoft.com/en-us/windows/win32/procthread/p...


This sounds like a trivial fix: put the two chiplets into separate processor groups, because that's effectively what they are.

This feature was originally about non-uniform memory access (NUMA), but effectively it is "core-to-socket" mapping. If a processor has chiplets on it, then it's effectively sockets-within-sockets. The software needs just a minor update to consider the chiplets to be the scheduling boundary instead of the AM5 socket.


Windows 'processor groups' aren't at all similar to linux NUMA aware scheduling, which is the proper method regardless, There's 64 bits that represent all cores on the system, setting those bits defines how a process is assigned to a core. The 'processor groups' is a hack that keeps the same bitmask that they originally used.

Nowadays windows can and will schedule across processor groups as per https://learn.microsoft.com/en-us/windows/win32/procthread/p...


Default locking every process onto a random chiplet doesn't sound like a great plan either.


AFAIK it doesn’t lock them, it just preferentially co-schedules things into a socket.


My understanding was that a thread is only eligible to be scheduled in a single processor group at any given time, and that windows will not change the group. Is that wrong?


That WAS correct. They corrected that after realising a 96-core processor has less cores available than a 64-core processor since processor groups split cores evenly.


Maybe something to do with Game Mode on Windows.


my guess is there is a bug in Windows related to scheduling which makes Zen 5 slower and by using the admin accounts for testing but comparing to Zen 4 tests results without admin account it was obfuscated.

I mean on Linux Zen 5 seems to be clearly faster then Zen 4 by a good margin.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: