
Inside Pascal: Nvidia 's Newest Computing Platform - jonbaer
https://devblogs.nvidia.com/parallelforall/inside-pascal/
======
tomkinstinch
Wow. It has 15.3 billion transistors. It's amazing we can buy something with
that many engineered parts. Even if the transistors are the result of
duplication and lithography, it's an astonishing number. Creating the mask
must have taken a while.

Does anyone know what the failure rate is for the transistors (or transistors
of a similar production process)? Do they all have to function to spec for a
GPU, or are malfunctioning transistors disabled or corrected? What does the QC
process look like?

~~~
djcapelis
Exact failure and bin rates for most semiconductor companies are considered
deep dark internal trade secret. Other than pure scale, yield rates are one of
the biggest factors in semiconductor cost and profit margin.

And the answer is it depends. If you lose some transistors, you expect to lose
the entire chip. But the vast majority of the transistors on each chip are
part of cache or many many duplicate GPU cores, which if they fail to pass
tests, can be disabled or downclocked and then the chip is binned into the
appropriate product line.

With GPUs this is much easier than other types of chips, because the level of
functional duplication that exists allows a lot of flexibility. If a core is
bad, you use a different one, and GPU cores are small enough they'd be stupid
not to put some spares on each chip. Same with memories.

Generally one can safely assume:

* Most chips that come off the line are binned into a lower category and do not function at max spec for everything, which is why the price jump is so high at the extreme upper end of a hardware series.

* With ASIC lithography most transistor malfunction isn't correctable, you mostly have to either downclock (some types of faults) or disable (the rest) that piece.

* Rates of transistor malfunction is still incredibly fucking amazingly phenomenally low. Like with 15B transistors on a chip, you have trouble affording a failure rate of even one in a billion.

So your line has to be, as the kids say: on fleek.

~~~
ckozlowski
Spot on.

I used to work for an OEM, and the Intel and AMD engineers would quietly
explain to us how this worked on a number of occasions.

The AMD X3 chips I think were the best example of this being done. These were
quad-core parts that AMD was manufacturing at the time, but had defect that
made one core faulty. So that core was disabled, and sold as triple-core part.
[http://www.zdnet.com/article/why-amds-triple-core-phenom-
is-...](http://www.zdnet.com/article/why-amds-triple-core-phenom-is-a-bigger-
deal-than-you-think/)

~~~
mud_dauber
I would expect that partially defective chips are repaired during probe or
(more likely) final test by blowing fuses.

The chip's yields, and therefore cost, will depend on the foundry's natural
defect rate per area and the design quality.

------
wyldfire
Half precision ftw! ML is the use case they're designing for, but we all get
to reap the benefits.

~~~
pavlov
Hasn't half precision (16-bit float) been in NVidia GPUs forever? I could
swear it was available back in the very first shader-capable Geforce FX days
already.

~~~
wyldfire
IIRC what's new here is native support in the ALUs, etc. I think the older
support was probably in software.

~~~
rsp1984
FP has always been 16- or 20-bit precision in the Tegra (mobile) chips up to
Tegra 4.

------
mtgx
IBM's Power9 and its future Power 3.0 ISA CPUs, which should increasingly
focus on deep-learning/big data optimization combined with Nvidia's GPUs which
will increasingly optimize for the same, should make an interesting match over
the next 5+ years.

On the gaming side, I do hope they continue to optimize for VR. I think AMD is
even slightly ahead of them on that.

------
gnuvince
Please say it's programmable in Pascal.

~~~
venomsnake
That will be only for the Turbo models. It was awesome language.

~~~
m_mueller
I'm not sure whether it was called Turbo Pascal, but we had this Pascal editor
with built in 2D draw windows to learn programming in high school. I'm still
looking for something equivalent (but maybe a bit more modern/portable) to
give to my kid when he's showing some interest. Scratch is nice, but the
visual programming becomes limiting very quickly. Is there anything like this
today?

~~~
draven
Openframeworks and Cinder are C++, so that's probably not fit for a first
experience.

Check out LÖVE: [https://love2d.org/](https://love2d.org/)

It's in Lua (so learning the language won't be a big part of the whole
experience) and runs on many different platforms.

~~~
m_mueller
Löve looks indeed awesome, thanks!

------
marmaduke
Is there a description somewhere without all the bla bla hype? Comparison with
past architectures would also be welcome, also without hype.

~~~
pklausler
The article has several tables that juxtapose the specs of the previous,
current, and new generations, and I think that you will enjoy reading it.

~~~
marmaduke
I have a fermi board which, despite worse numbers, easily outpaces a kepler
and maxwell board, for my workload.

So, yeah, those tables are hype too. I am asking about benchmarks on real
workloads.

~~~
bsprings
(Post author here.) Curious to hear more details about your workload, because
a 5+-year-old Fermi would truly be hard pressed to outperform Maxwell or even
a Kepler K40, let alone Pascal.

~~~
marmaduke
It's parameter sweeps of a delay differential equations, one simulation per
thread. This requires a lot of complex array indexing and global memory
access, so arithmetic density isn't near optimal. Still, it's a real world
workload that benefits hugely from GPU acceleration.

Moving from a GTX 480 to a Kepler or Maxwell card, the numbers go up, but not
the performance. I might have a corner case, but before investing in new
hardware, I would want to benchmark first and not blindly follow the numbers.

------
JustSomeNobody
300 Watts.

Toasty.

------
timeu
Sorry but does it run Crysis ? ;-)

But seriously quite impressive piece of hardware.

