There's some interesting backstory to this article about network-attached storage database systems. The point this article makes towards the end that a 400Gbe NIC would only be 25% utilized (and only unidirectionally, still have room for traffic the other way) doing this kind of blew me away.
If you're trying to scale a database's iops, scale out is even easier!
scylladb had a quick article on the forthcoming number of instruction cycles for high-speed network a couple years ago.
I can't find the quip anymore, but I think it was like 50 CPU cycles or less, so for a complex database (like a cassandra clone), not really that much time to calculate a response.
1.6 terabit cards will be nuts, but if you have 64 cores, maybe not actually that much different if the bandwidth is dedicated / segemented per core.
This is rapidly running way OT, but I start to feel like the system & even the core complex should stop being coherent at some point.
Just make a chip with 4x separate core complexes. Let them use PCIe NTB or some such to network/talk with each other.
Make the network card be SR-IOV, expose a couple different PCIe endpoints. Attach the network card to the 4x groups of cores.
Just looking at the fact that we spend 1/2 the 128 lanes of a Epyc building a 2P system, it just feels like... we shouldn't. Keep building multi-core, keep scaling. But stop trying to keep everything together. Start using networking, not coherency.
As I said, running rapidly OT. But a big big feel.
If you're trying to scale a database's iops, scale out is even easier!