Datacenter GPUs are mostly identical to the much cheaper consumer versions. The ...

endorphone · on Feb 24, 2020

"The only thing preventing you from running a datacenter with consumer hardware is the licensing agreement you accept."

The consumer cards don't use ECC and memory errors are a common issue (GDDR6 running at the edge of its capabilities). In a gaming situation that means a polygon might be wrong, a visual glitch occurs, a texture isn't rendered right -- things that just don't matter. For scientific purposes that same glitch could be catastrophic.

The "datacenter" cards offer significantly higher performance for some case (tensor cores, double precision), are designed for data center use, are much more scalable, etc. They also come with over double the memory (which is one of the primary limitations forcing scale outs).

Going with the consumer cards is one of those things that might be Pyrrhic. If funds are super low and you want to just get started, sure, but any implication that the only difference is a license is simply incorrect.

dnautics · on Feb 25, 2020

> For scientific purposes that same glitch could be catastrophic.

But for machine learning, some say that stochasticity improves convergence times!

artaak · on Feb 27, 2020

Could you cite an example where ECC requirement on GPU was real and demonstrated to be needed? In practice, I don't know anyone who'd willfully take 10-15% perf hit on GPUs, because of a cosmic ray.

The thermal design for "datacenter" card can be better for sure. And on-board memory size and design. That's about it. For how many x over geforce price tag is that?

endorphone · on Feb 28, 2020

"In practice, I don't know anyone who'd willfully take 10-15% perf hit on GPUs, because of a cosmic ray."

Virtually every server in data centers runs on ECC: the notion of not using it is simply absurd. And given that the Tesla V100 gets 900GB/s of memory bandwidth with ECC, versus 616GB/s of memory bandwidth on the 2080Ti without ECC, it's a strawman to begin with.

nvidia further states that there is zero performance penalty for ECC.

As to whether the requirement is "real", Google did an analysis where they found their ECC memory corrected a bit error every 14 to 40 hours per gigabit.

"That's about it."

Also ECC memory. Also dramatically higher double precision performance. Dramatically higher tensor performance. Aside from all of that...that's it.

luisfmh · on Feb 24, 2020

Learned a new word today. Pyrrhic.

KaiserPro · on Feb 24, 2020

And the cooling, amount of ram and the doubles performance.

the chip might be the same, but the rest of it isn't

Granted, its not worth the $3k price bump, but thats a different issue.

zwaps · on Feb 24, 2020

Nah that's not really it. The reason NVIDIA doesn't allow this is precisely because the additional RAM - functionally the only difference - is not cost efficient. People would like (and did) use a bunch of consumer 1080s, which is why NVIDIA disallowed precisely that. You had to buy the equivalent pro grade card, which costs easily two or three times that and offers a couple more GB of RAM.