
Nvidia  GeForce GTX 970: Correcting the Specs and Exploring Memory Allocation - wtallis
http://anandtech.com/show/8935/geforce-gtx-970-correcting-the-specs-exploring-memory-allocation
======
jacquesm
Nothing they can't fix with a refund or a discount. It'll hurt but the vast
majority of the users will be happy to continue to use what is a very high
performance card for a pretty good price.

After all, what are they going to buy instead with their refund? There would
have to be a card performing better for the same price in order to make a
trade make sense and even then they'll be spending time ripping apart their
pc's and travelling back-and-forth to stores or sending back units to online
shops. Lots of trouble for little gain.

I'm happy that anandtech took the time to point out the issue but it isn't
nearly as bad as it looks from a technical point of view, the discrepancy is
only a major one if you notice it in an application that you run and in that
case it might be worth the trouble but if you're happy then I don't see the
point.

Imagine you bought a car that was advertised as having 7 cylinders and it
turns out that you only have 6. That would be the nightmare of every car
company but in the end the number of cylinders is less important than whether
or not the car performs for its assigned duty or not.

~~~
rockdoe
Why would they give a refund or discount in the first place? "I would like to
return my Haswell chip because it has come to my attention that some of the
execution units can't execute all instructions."

I guess the problem is that they put the memory bandwidth into the official
specs.

~~~
jacquesm
To make sure their customers are happy.

That's a good thing when you're selling products that people tend to replace
every 2 to 3 years.

~~~
anon4
Yeah, but what else are their customers going to buy? AMD?

~~~
wmf
The customers aren't actually going to exercise the refund; the point of
offering it is just to prevent backlash.

[http://www.joelonsoftware.com/articles/customerservice.html](http://www.joelonsoftware.com/articles/customerservice.html)
(scroll down to #7)

~~~
Someone1234
I don't see the relevance here, but that article was actually really
interesting. Thanks for the link.

------
richthegeek
Perhaps I'm being dense in not seeing the issue here? It seems like a
relatively minor mistake which turns the card from "obscenely good value for
money" into "very good value for money".

Generally I buy hardware based on performance, not on technical
specifications. This is certainly normal purchasing behavior for a CPU (a
3.0GHz Intel beats similarly specced 3.0GHz AMD almost always), so I'm
surprised to hear people buy GPUs based on whatever obscure number they've
been told to care about.

~~~
gambiting
I've bought my last card(GTX780) with the very specific intention of using
all(well, nearly all) of its ram(3GB) for CUDA calculations I was doing at the
time - if it turned out that I couldn't allocate it fully, then the card
simply would not be what they have advertised and I would have returned it.

~~~
anon4
You can allocate it fully even here, it's just that accessing some addresses
is slower than others.

That's a shitty situation. If you want really fully guaranteed specs and
performance, buy a workstation card.

~~~
pixl97
>If you want really fully guaranteed specs and performance, buy a workstation
card

Wrong answer. If you want guaranteed specs you and every other user file a
complaint with your countries advertising standard. When a hefty fine is laid
upon said manufacture the spec sheets will become amazingly accurate.

~~~
felixgallo
Heh. Is that so.
[http://www.intel.com/content/dam/www/public/us/en/documents/...](http://www.intel.com/content/dam/www/public/us/en/documents/specification-
updates/xeon-e5-family-spec-update.pdf)

------
oflordal
Frankly, I am amazed technical marketing at Semi companies get the
specifications right as often as they do. They typically don't get to bother
engineering all too much and even if they typically have worked in engineering
before they probably didn't work at an architecture level.

Seems entirely plausible this information had not made it from engineering to
technical marketing and/or been missed in a review.

------
pixl97
Everyone in this thread needs to take a step back from the technical merits of
the card itself. This is an issue of deceptive advertising and every buyer of
the card should report it as such.

Take this for example. You go to the store and buy 1 pound of lean ground
beef. You cook it, eat it, and it tastes great. A day later you find out that
your beef was actually 90% beef and 10% horse. Were you harmed?, No. Is this a
deceptive practice?, Yes.

Customers in the U.S. should contact the FTC and in the U.K. the ASA so the
manufacture can be investigated and appropriately fined.

~~~
richthegeek
I disagree that this analogy is correct. To me it seems more like:

* The butcher offers you "1lb of beef"

* You buy it thinking it's 100% lean beef

* Turns out 12.5% of it is fatty beef

You still got 1lb of beef.

------
knweiss
IMHO the data transfer rate is the important info:

    
    
      First 3.5 GB : 196 GB/s == 7/8 of 224 GB/s
      Last  0.5 GB :  28 GB/s == 1/8 of 224 GB/s
    

_" When the GPU is accessing this segment of memory, it should achieve 7/8th
of its peak potential bandwidth, not far at all from what one would see with a
fully enabled GM204. However, transfer rates for that last 512MB of memory
will be much slower, at 1/8th the card's total potential."_
[http://techreport.com/review/27724/nvidia-the-geforce-
gtx-97...](http://techreport.com/review/27724/nvidia-the-geforce-
gtx-970-works-exactly-as-intended)

~~~
sharpneli
The data transfer rates are what will bite you, and actually are different
than advertised.

I know of at least one group that bought bunch of GTX970 cards because they
were supposed to have same mem bw as GTX980 but just less computation power.
Their application is memory bandwidth bound so that additional computation
would be wasted.

However this means they didn't really get what was promised. 196GB/s instead
of 224GB/s.

Even so, it's still has the best performance/price combo for that particular
GPGPU application.

~~~
paulmd
It's somewhat misleading to look at an arithmetic average of the bandwidth of
the fast/slow segments. Due to the way they architectured it, you cannot
access both the fast segment and the slow segment during the same memory-fetch
cycle, it's either/or. If control flow depends on data that's stuck in the
slow segment, performance could be significantly degraded.

Now - blah blah prediction, blah blah heuristics, yadda yadda. If you don't
use the memory fully (compute, 4K, etc) there's no problem, and even then you
can optimize the problem away somewhat. This will work pretty well for AAA-
grade game engines that get special attention - Unreal, CryEngine, Unity. But
for memory-bound (especially latency-sensitive) compute applications, what you
have here is a 3.5GB card, not a 4GB card.

Having a card show up with 1/8th of its specced memory units turned off is not
acceptable, regardless.

~~~
paulmd
Actually I should correct this - if you access the slow segment _at all_
performance will be degraded, since you cannot also access the fast memory
during the same cycle.

Looking at it on a 2-cycle basis, since performance is 7x as high you can
either access (7+7) or (7+1) chunks of memory. That's a 43% performance drop
if even 1 of the 32 threads in a warp consistently needs to touch the slow
segment.

That data being used for control flow will amplify the problem, of course,
since latency will double.

------
Kenji
This is kind of like the pen-tile controversy. A full HD pentile display does
not have 1920x1080 independent, atomic pixels. Therefore, in my eyes, it's an
obvious lie that the display is full HD. However, marketing says that a pixel
is not defined like that and therefore it's okay.

"Any sufficiently advanced business model is indistinguishable from a scam."

People can say it's okay because the card actually has 4GB. But where do you
draw the line? If 0.5GB had a throughput of 1 byte per second, would it be a
scam now? Technically it's still 4GB.

~~~
paulmd
The issue isn't the quantity of memory - there is physically 4GB of GDDR5
memory soldered to the board, yes.

The issue is the number of ROPs. If the board had 64 ROPs - as advertised -
then there wouldn't _be_ a fast segment and a slow segment of memory. All 4GB
would perform equally, there wouldn't be access contention between the fast
segment and slow segment, etc.

There's no cute marketing spin you can put on the board being advertised one
way and quietly showing up with 1/8th of its memory subsystem disabled. NVIDIA
isn't some fly-by-night hardware company like Butterfly Labs or something.

------
gambiting
So, simple question - if I bought that card to do CUDA work on it, could I
allocate the whole 4GB of GPU memory, or not? If not, I would be absolutely
asking for a refund.

Edit: ok, from what I understand you can allocate the whole 4GB,but 512MB of
it will be much slower memory. Also a basis for a refund in my eyes.

~~~
gavanwoolery
Sort of - with any card, you will never get that full amount because of the
operating system, which is likely using GPU memory (additionally, applications
such as your web browser are likely using the memory as well). On the other
hand, you can exceed the amount of GPU memory (using virtual memory, either
via the drivers or a hand-made solution), but depending on the amount of
thrashing it can slow things down greatly.

~~~
jacquesm
I don't think that answers his question.

It has nothing to do with the OS, browsers, virtual memory or trashing but it
has to do with the fact that 3.5G of the 4G is directly available but the
remaining 512M is only accessible through an indirect (and therefore slower)
path as per the article.

------
discardorama
Does anyone know (or can make an intelligent guess) on what impact this will
have on CUDA?

------
mrfusion
Could this be causing judder in the Oculus? And if so, is there a workaround?

