
The Age of Nvidia - douche
http://www.alexstjohn.com/WP/2017/06/10/the-age-of-nvidia/
======
bazizbaziz
> "The growing body of Big-data, HPC, and especially machine learning
> applications don’t need Windows and don’t perform on X86. So 2017 is the
> year Nvidia slips its leash and breaks free to become a genuinely viable
> competitive alternative to x86 based enterprise computing in valuable new
> markets that are unsuited to x86 based solutions."

Google's TPU paper [0] showed the CPUs were relatively competitive in the
machine learning space (within 2x of a K80). It's not true that x86 doesn't
perform on these workloads.

The existence of the TPU itself threatens Nvidia's dominance in the ML
processor space. Google built an ASIC in a short time period that more than
rivals a GPU on these tasks. The TPU performance improvements (section 7) make
it look very straightforward to get even better performance with a few more
years of development effort. With developers moving to higher level libraries,
migration between GPU/CPU/TPU becomes painless, so they'll just go with
whatever has the lowest TCO. (Google hosted TPUs?)

Aside from machine learning tasks, the author seems to be advocating for the
cpu/gpu combinations that AMD is already selling to game console
manufacturers. Granted, Nvidia has a piece of this via the Switch. If
Microsoft/Qualcomm goes full-on with their ARM-based x86 emulation, then
perhaps a future ARM-based Xbox is in the cards driven by an Nvidia chip?
/speculation

[0] [https://arxiv.org/abs/1704.04760](https://arxiv.org/abs/1704.04760)

------
digitalzombie
I always thought Microsoft keeping dominance via video game and directx was
their main drive to keep Linux out. Didn't realize directx was their move to
fight against Intel.

The unintentional effect of this was GPU manufacturer florish, which is
something I didn't know either.

The two big monopoly fighting against each other is super interesting. Also
highlight the fact that we do need some diversity and competition in these
segment.

Rooting for AMD on the GPU front to give Cuda a good challenge and also on the
Intel front for their CPU.

~~~
koffiezet
> and directx was their main drive to keep Linux out

OS/2 Warp could have been a target, but Linux was nowhere near a threat to
Microsoft when they came up with DirectX...

------
21
> Intel keeps the PCIe bus slow and limits the number of IO lanes that an
> Intel CPU supports thus ensuring that GPU’s are always dependent on an Intel
> CPU to serve their workload

I was looking into building a multi-GPU machine for ML and was very confused
as to why the latest Intel CPU has less PCIe lanes than the previous one, it
didn't made any sense to me. Those sneaky Intel bastards...

~~~
arnon
They want you to buy the expensive Xeons, or higher end i7s to get the proper
amount of PCI lanes.

------
sensecall
Cached:
[http://webcache.googleusercontent.com/search?q=cache:JDuUy_X...](http://webcache.googleusercontent.com/search?q=cache:JDuUy_X4iUcJ:www.alexstjohn.com/WP/2017/06/10/the-
age-of-nvidia/)

------
Hydraulix989
Uh... GPUs and CPUs fill very different niches.

GPUs aren't going to _replace_ CPUs any time soon because GPUs aren't capable
nor designed for general-purpose processing. It's a fundamentally REALLY HARD
problem to use GPUs to speed up any arbitrary computations (solve it and
you'll be a shoo-in for the Turing Award).

Businesses aren't going to be "moving" to GPUs. The "age of NVIDIA" is
primarily predicated on its role in accelerating training the machine learning
algorithms hyped up as "deep learning."

And AMD still very much has an uphill battle on both the CPU and the GPU
fronts.

But yes, the excitement around Intel that used to be there is now gone. You
can probably blame the "death of the PC" \-- we teach kids coding in the Bay
Area, and every single kid has an iPad tablet, but not a laptop. Sure
professional engineers like us still have x86 laptops, but the average person
does not.

~~~
ethikal
It's not just about accelerating ML, specifically deep learning. There are
many other enterprise technologies that can benefit from GPUs. One example:
OLAP-focused databases (such as MapD -
[https://www.mapd.com/](https://www.mapd.com/)). For some benchmarks, check
out this blog:
[http://tech.marksblogg.com/benchmarks.html](http://tech.marksblogg.com/benchmarks.html).

The DL "training" use-case is well-known at this point, but there are many
others which are emerging.

~~~
jhj
A GPU database isn't that useful, because the arithmetic intensity (ops/byte)
is relatively low. Cross-sectional memory bandwidth is what really matters;
you can get similar effects with a cluster of CPU machines provisioned
appropriately, with a shard or a replica of the database on each CPU machine.
I say this as someone who has written a GPU in-memory database of sorts that
is used at Facebook (Faiss), but what is interesting if you can tie that to
something that has higher arithmetic intensity before or after the database
lookup on the GPU.

GPUs are only really being used for machine learning due to the sequential
dependence of SGD and the relatively high arithmetic intensity (flops/byte) of
convolutions or certain GEMMs. The faster you can take a gradient descent step
means the faster wall clock time to converge, and you would lose by limiting
memory reuse (for conv/GEMM) or on communication overhead or latency if you
attempt to split a single computation between multiple nodes. The Volta
"tensor cores" (fp16 units) make the GPU less arithmetic bound for operations
such as convolution that require a GEMM-like operation, but the fact that the
memory bandwidth did not increase by a similar factor means that Volta is
fairly unbalanced.

The point about Intel not increasing their headline performance by as much as
GPUs is also misleading. Intel CPUs are very good at branchy codes and are
latency optimized, not throughput optimized (as far as a general purpose
computer can be). Not everything we want to do, even in deep learning, will
necessarily run well on a throughput-optimized machine.

~~~
arnon
Actually, in columnar databases the ops/byte intensity is significantly
greater, and the GPU helps here.

If you think about how a database CAN be built, instead of how they were built
until now, you will find that there are very interesting ideas that can and do
make use of the GPU.

The research into these has been around since 2006, with a lot of interesting
papers published around 2008-2010. There are also at least 5 different GPU
databases around, each with their own aspects and suitable use-cases [1]...

[1] [https://hackernoon.com/which-gpu-database-is-right-for-
me-6c...](https://hackernoon.com/which-gpu-database-is-right-for-
me-6ceef6a17505)

------
SirFatty
I've always found Alex St. John to be a blow-hard, having read his articles
dating back to Boot (Maximum PC). No to mention the fact that he creates
malware like WildTangent.

------
ethbro
In the DirectX comments, I'm struck by how interesting the strategic
development of Intel and Microsoft is.

I'd never thought about it before, but there aren't many industries that could
instantly (2-4 years) be killed solely by technical development.

If Intel had developed an OS-independent abstraction layer for devices, if
Microsoft had pushed harder on ISA-independent programs, etc, how would today
look?

~~~
Symmetry
Oh, Intel was certainly a contributor to Linux and Microsoft flirted with
having Windows run on other hardware. But backwards compatibility prevented
either from commoditizing their complement on the desktop.

~~~
tachyonbeam
FYI, Microsoft now has an emulator to run x86 programs on ARM. This is for
future laptops/tablet products.

------
arnon
One of the things Nvidia excels at, and has done very well in is supplying the
right ecosystem for writing performant GPU code. This really pushed the
adoption of GPUs forward.

Writing good, fast, high performance code for many x86 nodes is still quite
difficult. Nvidia's CUDA stack including Thrust, CUB, cuBLAS and other
libraries really made it easy to write parallel code, without thinking too
much about the lower level operations.

------
frozenport
This post doesn't mention the strategic dilemma Intel has with GPUs. GPUs are
fast because they have fast local memory, the core count is secondary, and
some Intel chips have 22x16 vector processing elements clocked 2x higher than
NVIDIA ( around 3k CUDA core equivalent ).

Why doesn't Intel make a fast local memory GPU style device. Well, they did
(Xeon Phi) but if they give it a reasonable price point it would cannibalize
their existing x86 market, which is more than the GPU market.

------
microcolonel
His main idea is that x86 will decline in GPU host roles because Intel is
deliberately limiting PCIe bus width, but then he shows AMD (another
manufacturer of x86 chips) which has no such interest, delivering what he's
asking for basically right now.

He also seems to think that NVIDIA will be the sole winner, but if you think
about it, AMD seems much better positioned. They manufacture chips which
implement the most popular server ISA (x86) and on the same process they
manufacture GPUs which are objectively more suitable for the tasks that are
driving adoption(and starting to be, now that the software is catching up).

So really the situation is not "NVIDIA is killing x86", it's "Discrete GPU
vendors are selling more GPUs than fit on Intel's x86 implementation. Which
will drive the adoption of systems with lots of bus width."

~~~
jjm
It is, I think he ends with the possibility that with Arm being integrated
into the GPU - a new possible world exists with the lack is a lack of Intel.
With Linux and Windows Arm ready it could be very interesting the kind of
possible system builds. For one having as many pci express lanes (or something
completely different). Would be great to see a platform that is more friendly
to extracting as much out of it as possible.

~~~
microcolonel
Wouldn't that be basically exactly the same as integrating the GPU into the
CPU? This is something that AMD already does. The question is: _does it have a
considerable positive impact on compute workloads?_. It doesn't seem like
anyone is clambering for it, so I doubt it matters much. More peripheral bus
bandwidth and connectors is probably all that anyone cares about.

------
rsp1984
_Because this is the year that the first generation of self-hosting GPU’s are
widely available on the market_

Really? Then what's inside of 99% of smartphones and tablets today already?
It's not like ARM cores + beefy GPUs is a brand new concept. In fact even the
first Raspberry Pi featured such a combination.

Intel plays no role in the smartphones and tablets business already and anyone
who doesn't care about x86 compatibility has been free to use whatever non-
Intel accelerator in the datacenter / HPC space for years now. Not quite sure
what the dramatic thing is that the author is implying has changed.

------
lrenaud
It's a fun article, but it actually drives me to a question: is it possible
(by which I mean remotely feasible) to design a user experience from the
ground up around a massively parallel system?

Let's say you had some low speed hardware to support a beefy GPU in a case,
could a manageable operating system be built that wouldn't spend all of it's
time in IO-wait? I like the idea and challenge in designing this way, but I
don't see a way forward that doesn't just fall back on the architecture of
core computational unit with specialized auxiliary units quite rapidly.

------
Apreche
Knowing this, what should a developer learn if they want to get a job in the
brave new non-wintel world? ARM? CUDA?

~~~
golergka
Most of developers are not working with code that requires top-notch computer
power; I don't think that Nvidia/Intel rivalry will affect the market for
people who know JS, CSS and HTML to a significant degree.

But generally speaking, learning Caffe, Tensorflow and math in general sounds
like a much more important investment than low-level libraries for specific
hardware.

~~~
pm90
I think this is a good point. One other thing I would like to emphasize, after
having worked only a little while professionally, is that its not as important
what specific architecture you choose while learning, except that maybe you
will write code faster and already know many of the nuances. There are so many
resources available to learn any architecture now, that the best advice would
be to pick a reasonable one and just learn it; with that experience, one can
then move with confidence between architectures.

------
TheRealDunkirk
What struck me most was the part about another giant corporation abusing its
monopoly position to screw a competitor, at the of the expense of the
customer.

~~~
Steltek
An intelligent market will always fall victim to a tragedy of the commons
scenario. Consumers will take the quick buck and let someone else worry about
the collective future impact.

------
combatentropy
I had wondered whether Microsoft and Intel were more a fraternity or an uneasy
alliance. This article gives hard details about how they have always been
forging sabers to cut each other's throats whenever the time was ripe.

It also woke me up to Intel's hardware throttling, starving the graphics unit
of enough busses. I had been in awe of PCIe's bandwidth, but now I know it
could be so much more. It reminds me that if you focus on microspecs doubling
every few years, you think there's breathtaking progress. But if you take a
step back at the overall result, computing is moving much more slowly. For
example, it seems like the average laptop has always had 6-9 hours of battery
life.

------
0x4f3759df
The youtube of "NVidia GTC 2017 conference" is worth a look.

~~~
visarga
For a moment they made me feel like it was 2007 again and I was watching an
Apple keynote. I didn't expect to have that feeling again from any company.

~~~
0x4f3759df
Elon Musk was at the 2015 conf, so you might like that too.

------
deepnotderp
One disagreement i have with this otherwise great piece is that the future may
very well end up becoming MIMT/D. MIMD is a more flexible programming model
than the lockstep SIMT of GPGPU and examples like the Rex Neo, Adapteva
Epiphany and OpenPiton show that efficiency often supersedes GPUs.

~~~
omikun
SIMT is inherently more power efficient than MIMD. Less control flow
logic/flop. Even then, it makes sense to devote dedicated logic for specific
algorithms. Even NVIDIA GPUs (Volta) are going to have special matrix multiply
hardware (tensor cores) to increase power efficiency and performance.

The future lies not in flexible programming model but dedicated hardware/IP.
Look at the crypto block, ISP, h264/265 encoding/decoding, and now tensor
cores. It's mentioned in what seems like every architecture paper in the last
ten years, but dark silicon is driving the need to differentiate compute into
smaller blocks. We can pack more and more transistors into a chip, but we can
only power a smaller and smaller section of it at any given time. It only
makes sense that we make whatever that can be powered on be as efficient as
possible.

~~~
deepnotderp
In theory yes, but clean restarts like the Adapteva Epiphany and the Rex Neo
can get better efficiency than GPUs because they don't suffer from legacy
issues while still running legacy OpenCL code.

As for the matrix multiplier ASICs like the tpu and Volta, I consider them to
be incredibly uncreative and an insult of sorts to computer architecture to
call that a "deep learning processor". What happens when tomorrow SPNs or
graph ConvNets dominate? A proper application specific processor will be able
to adapt and still maintain efficiency.

Obviously i have some bias and hubris here, but our simulations show
consistently superior efficiency to the tpu while running the same workloads
while still retaining the ability to adapt to other computational graphs that
TensorFlow may choose to run.

------
rcdmd
The article's thesis, "Well it appears that the GPU era of computing is
finally here! Intel is in deep trouble," has an implicit assumption that
Intel's future mostly depends on processing power. Is that really the case?

------
anonu
Even with NVDA's recent rally, INTC is still very close to double NVDA's size
($168bn vs $88bn market cap). The point being, the author's statement that the
x86 party is over seems a bit far off at the moment...

------
Pigo
All of this did give me some insight into why I see ARM-based Windows servers
in Azure.

------
flipgimble
" 2017, the year GPU’s finally begin to permanently displace the venerated x86
based CPU"

he never specifies what he means by displace exactly, and follows up with
vacuous statements like "x86 party ends in 2017". The entire thing sounds like
trying to manufacture a momentous event out of a gradual shift of market
profits and priorities over time. Best you could hope for is to state that
Intel has failed to take a significant market in high end throughput computing
(ie. desktop and HPC GPUs), but its doing quite well with integrated GPUs.

Don't forget this is the same visionary that wrote the game industry playbook
on how to hire and exploit developers:
[https://www.kotaku.com.au/2016/04/alex-st-johns-ideas-
about-...](https://www.kotaku.com.au/2016/04/alex-st-johns-ideas-about-game-
development-are-terrifying/)

~~~
lnanek2
It's very wordy and takes a lot of reading, but he does have a pretty solid
point. 1) He is talking about enterprise computing, and makes that clear with:
" Up until now Intel has held a dominant monopoly over Enterprise computing
for many years, successfully fending off all challengers to their supremacy in
the Enterprise computing space. This dominance is ending this year and the
market sees it coming. " So integrated graphics like you mention is
irrelevant.

Then at the end he lists why he thinks that with links: Softbank bought ARM
and funded NVIDIA, who announced an ARM & NVIDIA integrated enterprise
computing product. IBM is supporting NVIDIA with a POWER and NVIDIA integrated
enterprise computing product, and AMD is supporting NVIDIA in Ryzen by
providing lots of PCIe bandwidth to the graphics card to support compute
tasks.

~~~
XorNot
Except AMD own ATI which is their own GPU brand. So no, they're doing that
because they want to move GPUs and they'll want those GPUs to be AMD ones.

~~~
tankenmate
Also, it is a bit of a niche, but AMD's single precision compute is typically
better than NVIDIA's (spFLOP/$ both capital and operational).

~~~
rhaps0dy
By single precision, do you mean 32-bit floating-point computation?

Probably not but, if so, isn't that what both computer gaming and deep
learning need the most?

~~~
majewsky
Yes, at least for gaming. (Don't know about DNN.) Single-precision is the only
kind that GPUs supported until CUDA happened.

Around 2011, I got my feet wet in CUDA and tried to calculate quantum
waveforms (using a method that is mostly matrix multiplications and FFTs). I
eventually went back to doing stuff on the CPU because GPU memory was too
small in the systems that I had access to (256 MB), which restricted me to one
job at a time, whereas the CPU (a contemporary i7) had enough cores and memory
to do 4 jobs in parallel. And I needed double precision, which the GPU could
only execute at a tenth the speed of a single-precision job. Also, with the
GPU, I was restricted to running jobs during the night since those systems
were desktops that were also used for classes. Whenever one of my calculations
ran, it would occupy the GPU completely, this rendering the graphical login
unusable.

I reckon that the situation would look much more favorably for the GPU today,
esp. because of the larger memory sizes and because double-precision speed has
caught up. But yeah, the most common uses need only single-precision.

~~~
imbusy111
GeForce 5xx series came out in 2010
([https://en.wikipedia.org/wiki/GeForce_500_series](https://en.wikipedia.org/wiki/GeForce_500_series))
and NONE of them had less than 1GB of memory. Idk what GPU you used, but it
was old technology at that point.

~~~
majewsky
Probably. Whoever bought those machines probably didn't realize that GPU
performance was quickly becoming a relevant metric for scientific computation.

