My point of reference is that back in undergrad (~10-15 years ago), I recall a class assignment where we had to optimize matrix multiplication on a CPU; typical good parallel implementations achieved about 100-130 gigaflops (on a... Nehalem or Westmere Xeon, I think?).
My point of reference is that back in undergrad (~10-15 years ago), I recall a class assignment where we had to optimize matrix multiplication on a CPU; typical good parallel implementations achieved about 100-130 gigaflops (on a... Nehalem or Westmere Xeon, I think?).