The background for this really puzzles me. AMD said you should be seeing more im...

Out_of_Characte · 2024-08-17T21:06:19 1723928779

The phoronix bench suite showed performance in line of expectations. Its specifically games on windows that regressed. I have no clue how windows scheduler works but that can have a huge impact in games specifically.

anonymoushn · 2024-08-17T21:27:24 1723930044

Compared to Zen4, Zen5 has much higher latency between cores on different chiplets. It's possible that a scheduler + app combo could regress on Zen5 for this reason. It sounds basically impossible for single-threaded apps to be impacted because timeslices are really big. Multithreaded apps where threads are communicating constantly could easily run slower if the scheduler sees all the cores as identical and interchangeable.

I don't know much about this topic, but it seems like Windows uses Processor Groups for scheduling[0], and generally tries to fit each NUMA node into 1 Processor Group (as long as it has at most 64 cores in it). Since the issue here is latency between chiplets and no NUMA is involved, all the cores go in the same Processor Group.

[0]: https://learn.microsoft.com/en-us/windows/win32/procthread/p...

jiggawatts · 2024-08-18T00:01:09 1723939269

This sounds like a trivial fix: put the two chiplets into separate processor groups, because that's effectively what they are.

This feature was originally about non-uniform memory access (NUMA), but effectively it is "core-to-socket" mapping. If a processor has chiplets on it, then it's effectively sockets-within-sockets. The software needs just a minor update to consider the chiplets to be the scheduling boundary instead of the AM5 socket.

Out_of_Characte · 2024-08-18T19:06:50 1724008010

Windows 'processor groups' aren't at all similar to linux NUMA aware scheduling, which is the proper method regardless, There's 64 bits that represent all cores on the system, setting those bits defines how a process is assigned to a core. The 'processor groups' is a hack that keeps the same bitmask that they originally used.

Nowadays windows can and will schedule across processor groups as per https://learn.microsoft.com/en-us/windows/win32/procthread/p...

Dylan16807 · 2024-08-18T04:20:51 1723954851

Default locking every process onto a random chiplet doesn't sound like a great plan either.

jiggawatts · 2024-08-18T07:24:14 1723965854

AFAIK it doesn’t lock them, it just preferentially co-schedules things into a socket.

Dylan16807 · 2024-08-18T23:33:08 1724023988

My understanding was that a thread is only eligible to be scheduled in a single processor group at any given time, and that windows will not change the group. Is that wrong?

Out_of_Characte · 2024-08-19T11:29:49 1724066989

That WAS correct. They corrected that after realising a 96-core processor has less cores available than a 64-core processor since processor groups split cores evenly.

Rohansi · 2024-08-17T23:02:04 1723935724

Maybe something to do with Game Mode on Windows.

dathinab · 2024-08-17T21:32:56 1723930376

my guess is there is a bug in Windows related to scheduling which makes Zen 5 slower and by using the admin accounts for testing but comparing to Zen 4 tests results without admin account it was obfuscated.

I mean on Linux Zen 5 seems to be clearly faster then Zen 4 by a good margin.