Memory interface width of modern CPUs is 64-bit (DDR4) and 32+32 (DDR5).
No CPU uses 128b memory bus as it results in overfetch of data, i.e., 128B per access, or two cache lines.
AFAIK Apple uses 128B cache lines, so they can do much better design and customization of memory subsystem as they do not have to use DIMMs -- they simply solder DRAM to the motherboard, hence memory interface is whatever they want.
> Memory interface width of modern CPUs is 64-bit (DDR4) and 32+32 (DDR5).
Sure, per channel. PCs have 2x64 bit or 4x32 bit memory channels.
Not sure I get your point, yes PCs have 64 bit cache lines and apple uses 128. I wouldn't expect any noticeable difference because of this. Generally cache miss is sent to a single memory channel and result in a wait of 50-100ns, then you get 4 or 8 bytes per cycle at whatever memory clock speed you have. So apple gets twice the bytes per cache line miss, but the value of those extra bytes is low in most cases.
Other bigger differences is that apple has a larger page size (16KB vs 4KB) and arm supports a looser memory model, which makes it easier to reach a large fraction of peak memory bandwidth.
However, I don't see any relationship between Apple and PCs as far as DIMMS. Both Apple and PCs can (and do) solder dram chips directly to the motherboard, normally on thin/light laptops. The big difference between Apple and PC is that apple supports 128, 256, and 512 bit wide memory on laptops and 1024 bit on the studio (a bit bigger than most SFFs). To get more than 128 bits with a PC that means no laptops, no SFFs, generally large workstations with Xeon, Threadrippers, or Epyc with substantial airflow and power requirements
FYI cache lines are 64 bytes, not bits. So Apple is using 128 bytes.
Also important to consider that the RTX 4090 has a relatively tiny 384-bit memory bus. Smaller than the M1 Max's 512-bit bus. But the RTX 4090 has 1 TB/s bandwidth and significantly more compute power available to make use of that bandwidth.
The M4 max is definitely not a 4090 killer, does not match it in any way. It can however work on larger models than the 4090 and have a battery that can last all day.
My memory is a bit fuzzy, but I believe the m3 max did decent on some games compared to the laptop Nvidia 4070 (which is not the same as the desktop 4070). But highly depended on if the game was x86-64 (requiring emulation) and if it was DX11 or apple native. I believe apple claims improvements in metal (the Apple's GPU lib) and that the m4 GPUs have better FP for ray tracing, but no significant changes in rasterized performance.
I look forward to the 3rd party benchmarks for LLM and gaming on the m4 max.
Memory interface width of modern CPUs is 64-bit (DDR4) and 32+32 (DDR5).
No CPU uses 128b memory bus as it results in overfetch of data, i.e., 128B per access, or two cache lines.
AFAIK Apple uses 128B cache lines, so they can do much better design and customization of memory subsystem as they do not have to use DIMMs -- they simply solder DRAM to the motherboard, hence memory interface is whatever they want.