> instead of PC limitation of 128 bit wide Memory interface width of modern CPUs...

sliken · 2024-10-30T17:28:46 1730309326

> Memory interface width of modern CPUs is 64-bit (DDR4) and 32+32 (DDR5).

Sure, per channel. PCs have 2x64 bit or 4x32 bit memory channels.

Not sure I get your point, yes PCs have 64 bit cache lines and apple uses 128. I wouldn't expect any noticeable difference because of this. Generally cache miss is sent to a single memory channel and result in a wait of 50-100ns, then you get 4 or 8 bytes per cycle at whatever memory clock speed you have. So apple gets twice the bytes per cache line miss, but the value of those extra bytes is low in most cases.

Other bigger differences is that apple has a larger page size (16KB vs 4KB) and arm supports a looser memory model, which makes it easier to reach a large fraction of peak memory bandwidth.

However, I don't see any relationship between Apple and PCs as far as DIMMS. Both Apple and PCs can (and do) solder dram chips directly to the motherboard, normally on thin/light laptops. The big difference between Apple and PC is that apple supports 128, 256, and 512 bit wide memory on laptops and 1024 bit on the studio (a bit bigger than most SFFs). To get more than 128 bits with a PC that means no laptops, no SFFs, generally large workstations with Xeon, Threadrippers, or Epyc with substantial airflow and power requirements

Rohansi · 2024-10-30T19:15:35 1730315735

FYI cache lines are 64 bytes, not bits. So Apple is using 128 bytes.

Also important to consider that the RTX 4090 has a relatively tiny 384-bit memory bus. Smaller than the M1 Max's 512-bit bus. But the RTX 4090 has 1 TB/s bandwidth and significantly more compute power available to make use of that bandwidth.

sliken · 2024-10-30T19:46:05 1730317565

Ugh, should have caught the bit vs byte, thanks.

The M4 max is definitely not a 4090 killer, does not match it in any way. It can however work on larger models than the 4090 and have a battery that can last all day.

My memory is a bit fuzzy, but I believe the m3 max did decent on some games compared to the laptop Nvidia 4070 (which is not the same as the desktop 4070). But highly depended on if the game was x86-64 (requiring emulation) and if it was DX11 or apple native. I believe apple claims improvements in metal (the Apple's GPU lib) and that the m4 GPUs have better FP for ray tracing, but no significant changes in rasterized performance.

I look forward to the 3rd party benchmarks for LLM and gaming on the m4 max.

reliabilityguy · 2024-10-31T01:52:06 1730339526

What I was trying to say is that there is no 128b limitation for PCs.