
An History of Nvidia Stream Multiprocessor - ingve
http://fabiensanglard.net/cuda/index.html
======
boulos
Fabien, how come you skipped Volta? (Presumably because it wasn’t interesting
for Display, but it’s a huge step up from Pascal)

The most interesting thing to me in the progression of NVIDIA’s compute
offerings is that until Volta, the “SIMT” claims were just a programming model
and the hardware was actually just like any other SIMD hardware (though with
automatic mask stacks). In Volta, they finally added “we can run different ops
at different program counters”.

For the casual reader, the hot chips 2017 presentation is probably the easiest
skim:

[https://www.hotchips.org/wp-
content/uploads/hc_archives/hc29...](https://www.hotchips.org/wp-
content/uploads/hc_archives/hc29/HC29.21-Monday-Pub/HC29.21.10-GPU-Gaming-
Pub/HC29.21.132-Volta-Choquette-NVIDIA-Final3.pdf)

------
szatkus
> Since Intel proved that there is still room for miniaturization with the 7nm
> of Ice Lake, there is little doubt Nvidia will leverage it to shrink its SM
> even more and double performance again.

Intel calls their process 10nm and it's weird to mention it since it was a
total disaster.

------
twic
The SVG doesn't render at all right in Firefox, for me.

~~~
fabiensanglard
Thanks for telling me, I had no idea. It looks like Firefox and Edge don't
render in the same way Inkscape/Chrome do.

It seems converting the whole SVG from "Object to Path" fixed the issue
everywhere.

~~~
twic
I know next to nothing about SVG; do you think this was a mistake in the SVG,
or is it a bug in Firefox's rendering? If the latter, ideally, we would file a
bug report, although i don't actually know how you do that for Firefox these
days.

~~~
fabiensanglard
It think it is a bug in Firefox rendering since Inkscape and Chrome render it
properly.

If I remove the "writing-mode:tb-rl" firefox renders it properly. It seems FF
does not support that attribute.

I have kept both versions of the SVG and intended to fill a bug report.

[http://fabiensanglard.net/cuda/g71.svg](http://fabiensanglard.net/cuda/g71.svg)
[http://fabiensanglard.net/cuda/g71_org.svg](http://fabiensanglard.net/cuda/g71_org.svg)

I have this weird guts feeling it will be closed as "we know. we don't support
it yet. Come back later". So I abandoned the idea.

~~~
twic
Interestingly, a documentation overview says that tb-rl is an allowed value:

[https://developer.mozilla.org/en-
US/docs/Web/SVG/Attribute/P...](https://developer.mozilla.org/en-
US/docs/Web/SVG/Attribute/Presentation#attr-writing-mode)

But the detailed documentation it links to does not:

[https://developer.mozilla.org/en-
US/docs/Web/SVG/Attribute/w...](https://developer.mozilla.org/en-
US/docs/Web/SVG/Attribute/writing-mode)

I had a look in Bugzilla and couldn't find anything specific to tb-rl, but did
find an ancient meta bug for writing-mode which links to millions of other
bugs:

[https://bugzilla.mozilla.org/show_bug.cgi?id=writing-
mode](https://bugzilla.mozilla.org/show_bug.cgi?id=writing-mode)

------
Scene_Cast2
I'm curious about the addition of Tensor Cores. It sounds like both the
regular Shader Units and Tensor Cores can carry out GEMM, just that the latter
are more efficient at it. I'm guessing that for ML applications, both the
Tensor Cores and Shader Units are used.

Are tensor cores idle during 3D gaming? How do they differ from the regular
Shader Units? Why are they more efficient?

~~~
hesk
> Are tensor cores idle during 3D gaming?

The tensor cores are used for real-time raytracing, ML-based anti-aliasing
during gaming, and other features. The Turing whitepaper has a lot of
information on that.

[https://www.nvidia.com/content/dam/en-zz/Solutions/design-
vi...](https://www.nvidia.com/content/dam/en-zz/Solutions/design-
visualization/technologies/turing-architecture/NVIDIA-Turing-Architecture-
Whitepaper.pdf)

A more generic answer to this question is that everything NVIDIA puts in their
GPUs is for the gaming market. Other use cases are incidental.

~~~
Causality1
Between the 30% framerate hit and the (to me) barely-perceptible visual
difference, real-time raytracing still feels like a gimmick.

~~~
mrguyorama
The new Minecraft stuff is the very first time I've seen raytracing produce
graphics that make my brain conceive this could be a "realistically lit"
scene, and it doesn't even take advantage of things like significant amounts
of reflection and re-reflection. HOWEVER, I see zero benefit to the actual
game from that.

~~~
kevingadd
In the case of Minecraft no new graphics feature is really going to benefit
the "actual game", but players who obsess over their created spaces might
really appreciate having better lighting that they can control through things
like reflectivity. It's something you'd have to poll the average teen-who-
still-plays-Minecraft about though.

------
tyingq
The title led me down this somewhat interesting rabbit trail of "why an?":
[https://www.merriam-webster.com/words-at-play/everything-
you...](https://www.merriam-webster.com/words-at-play/everything-youve-ever-
wanted-to-know-about-historic-and-historical)

~~~
fabiensanglard
Fixed. Thank you!

~~~
tyingq
Oh, sorry, wasn't saying it was wrong. People seem split on it. Just curious.

------
mshockwave
> This fragmented design reminds of the Pre-Tesla layered architecture,
> proving once again that history likes to repeat itself.

So true.

~~~
corysama
On the Hardware Wheel of Reincarnation [1] we are definitely swinging back
into the realm of special-purpose processors and processor extensions all over
the place. The gigahertz scaling free lunch has been dead for over a decade.
We're still adding more transistors, but finding it harder to speed up day-to-
day serial code. So, instead we're adding dedicated hardware for the common,
hard stuff and throwing special-purpose chips into every nook and cranny of
everything around the periphery of the main thread on the main CPU.

[1]
[https://www.computerhope.com/jargon/w/wor.htm](https://www.computerhope.com/jargon/w/wor.htm)

~~~
TaylorAlexander
We are also inventing new ways to do computation. Massive deep neural nets
require different hardware, and it seems there’s plenty of room to improve and
experiment there.

------
ver_ture
I wish I understood more of this, what a cool field of engineering.

------
captainbland
Nice comparison. If I could offer one bit of feedback: I think that graph at
the end should be TFLOPS rather than GFLOPS.

~~~
fabiensanglard
Good catch! Fixed.

------
villgax
One of the things I had a fit of laughter was after reading that Jensen used
to work at AMD before!!

~~~
david-gpu
I heard from a first-hand account that in the late nineties Jensen tried to
sell nVidia's IP to ATI. They refused, even though the offer was good. The
rest, as they say, is history.

~~~
philjohn
Didn't he want to be made head of ATI at the time though, as part of the deal?

~~~
szatkus
That was AMD. They wanted to merge with Nvidia for their graphics IP, but
didn't want to accept the condition, so they bought ATI eventually.

