The RAM is no longer fast: unless cached, it takes around 150 CPU cycles to access the RAM.
The RAM is no longer byte addressable. It’s closer to a block device now, the block size being 16 bytes for dual-channel DDR, 32 bytes for quad channel.
Too bad many computer scientists who write books about those algorithms prefer to view RAM in an old-fashioned way, as fast and byte-addressable.
For most practical purposes I believe in x86 computers, the block size to consider should be at least a cache line, so 64 bytes.
I now think the next target ought to be latency and power defined, in particular with IOT requirements.
Microseconds, Milliwatts, Millions of endpoints.
or, for storage, Microseconds, Millions IOPS, Multi-Parity
or for computer, efficiency, IOT goals:
Milli-watts as a constraint for a benchmark of compute
Millions of endpoints / processes or cores on a network topology
Microseconds as a measure generically of access to resources, whether memory is sat on another node, or local store, the delay ought to be inside the same order of magnitude as a target.
My purpose in all of that, is, should there - in fact, can there be - consideration of architecture and topology that avoid .. rather can we aim for linear cost in addressing complexity at all, as a goal, or has that been lost already?
I mean "that" and "lost" very vaguely, being non expert, but my question is really should I imagine that there are no effective gains to be had, designing for a IOT style or massively networked future, from the way memory is addressed, or has the complexity that we have, been introduced out of necessity, so will be here to stay, for practical reasons, so the idea of low latency, low power, IOT "grid computing" on a ad hoc basis, is smoke in my pipe?
Don't forget the 8-deep prefetch buffer. The real block size is more like 128 bytes for dual channel.