Hacker News new | past | comments | ask | show | jobs | submit login

This should change soon, as EPYCs can do 204GB/s of memory throughput (plus tons of pcie4.0). They also aren't cpu tied, all of the SKUs get all of the lanes



It's a NUMA though. You'll have to write software specifically for NUMA to get anywhere close to that number. To make matters worse, EPYC also doubles the number of cores. To be fair, though, it also has a substantial amount of cache, but cache is not a panacea if you need something in the main RAM. And if you're churning through gigabytes of stuff per second, you'll be needing that very, very often.


Yes, NUMA and large numbers of core will pose completely new challenges for software optimization. Applications will have to become aware of the memory architecture of the underlying machine. They will have to make explicit assignments of memory allocations and threads to NUMA nodes based on their domain specific needs. In some cases, even duplicating data structures may be the right call. This is going to challenge how most developers write fast software.

The other thing is that even seemingly trivial things like spawning and synchronizing with lots of threads will be much more complex on CPUs with many cores. At some point, naively looping over all threads is going to be too slow. I think that the limit is going to be around 64 cores. Past that, you should actually parallelize your worker thread management to stay efficient. There is precendent for this in HPC, e.g. MPI implementations.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: