Hacker News new | past | comments | ask | show | jobs | submit login

What I don't understand with computer chips, is how really relevant the FLOPS unit is, because in most situations, what limits computation speed is always the memory speed, not the FLOPS.

So for example a big L2 or L3 cache will make a CPU faster, but I don't know if a parallel task is always faster on a massively parallel architecture, and if so, how can I understand why it is the case? It seems to me that massively parallel architectures are just distributing the memory throughput in a more intelligent way.




You have to look at all the numbers (I/O, on-chip memory, flops, threads) and see if the architecture fits your problem. Some algorithms like matrix matrix multiplication are FLOPS bounds. It's rare to see a HPC architecture (don't know if there is one?) that can't reach close to the theoretical flops with matrix matrix multiplication. Parallel architectures and parallel algorithm development go hand in hand.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: