
AMD Radeon VII: High-End 7nm Vega Video Card - shock
https://www.anandtech.com/show/13832/amd-radeon-vii-high-end-7nm-february-7th-for-699
======
dragontamer
Its an interesting design for sure.

4x Stacks of HBM2 means 1TBps memory bandwidth at 16GB. Only 60-compute units
enabled (maybe 4-CUs are expected to break during manufacturing? Its a weird
number for sure...).

Since it shares dies with the MI50, the Radeon VII will have 1/2 speed double-
precision, making this the cheapest high-performance double-precision card in
existance.

\------

FP16 compute is supported at double speed, but there are no tensor cores. So
FP16 matrix multiplication / tensor ops are still a major benefit to NVidia.

But the memory size and bandwidth is quite salivating. That's a lot of
bandwidth, and a number of problems are known to be memory-bound. Deep
learning enthusiasts probably will stick to NVidia cards, but other compute
problems may want to start playing around with this thing.

\------

Video gamers seem meh about the specs. But I think anyone looking at this card
for its compute performance would be impressed.

~~~
doctorpangloss
> Deep learning enthusiasts probably will stick to NVIDIA cards

We've been building the wrong hardware for ML for a while now.

A TPU doesn't delivery supremacy for problems over GPU hardware. It's a Google
senior engineering retention and PR project.

Exceedingly few problems resemble image recognition. You wouldn't be able to
tell from the research, because due to the tools it's sort of the only
affordable thing to do.

Playing Go barely looks like it. Or more accurately, Go doesn't look like most
other games. Molecular physics and synthesis barely looks like it. Neural
programs/neural Turing machines that matter (converting complex sim code like
game rules and physics into neural networks via learning) may never achieve
the accuracy or performance/watt necessary to really compete with just making
a bunch of CPUs or straight up dedicated hardware.

Word embeddings are a disaster. People keep trying to do innovative stuff with
it, and it's 2018 and we've just just recently got entity names from
Wikipedia. I think Peter Norvig, having basically only machine translation to
point to, is really going to eat crow when non-perceptual AI tasks sort of
have nothing to do with stuff Google has developed. But what do I know.

I'm sure someone's going to trot out some obscure TPU or GPU-driven thing.
It's been years! Awesome innovations in hardware really do lead to obvious,
immediate crazy cool stuff. I'm really talking about the Kinect as an example
here, in that it was really hyped and not much came of it. The TPU, and ML-
specified hardware targeted to today's problems, is a slower-motion Kinect of
our time.

Mining cryptocurrency, the most recent innovation, is the _opposite_ of cool.

I'm confident these tools exist to capitalize on the subsidy from the gaming
industry. AMD is wise to not chase ML features.

~~~
nradov
What is the right hardware for ML?

~~~
doctorpangloss
Probably Larabee. Too early and too expensive for its time.

~~~
twtw
So that's why Xeon Phi has been so successful!

/s

Real talk, though. Why do you think larabee is the "right" hardware?

~~~
vl
The truth is that for many types/sizes of models AVS2 or AVS512 in Xeons is as
fast and GPUs.

~~~
frankchn
I am interested in seeing benchmarks regarding this. I believe this is true
for models with truly huge embeddings, but otherwise if your models are even a
little compute dense then GPUs are faster.

~~~
Cybiote
My experience is that you need the additional caveat of a streaming fairly
homogeneous and highly parallelizable approach or the memory transfer,
branching and communication overheads will eat away nearly all the GPU gains.

I've also noticed that a lot of the time, people are comparing GPU to naive
implementations (and sometimes, implementations written in dynamic languages)
instead of to highly tuned BLAS or MKL implementations. For a large swathe of
problem types, using the fastest math library will reduce the CPU/GPU gap to
less than an order of magnitude or less.

~~~
shaklee3
No, they're not. Straight front the horse's mouth:

[https://software.intel.com/en-
us/mkl/features/benchmarks](https://software.intel.com/en-
us/mkl/features/benchmarks)

In no case will you see it get close to 10TFLOPS. GPUs easily do this, and can
approach 100TFLOPS with tensor cores.

~~~
Cybiote
I don't dispute this. If you can remove memory transfer overhead and meet the
requirements I mentioned, then GPUs will be much better. But there are many
problems that do not fit that (SIMD per warp) regime. In short, GPUs are no
panacea and have their own trade-offs and bottleneck sensitivities, just like
everything else.

~~~
shaklee3
I agree that they are no panacea, but we also have an example of the Intel
Xeon phi, which did have high speed hbm memory. That too did not compete with
GPUs. I think the majority of it has to do with the GPU chip itself being
extremely simple, with no Branch prediction, no branch difference, very simple
caching behavior, etc. They are able to allocate a much larger portion of the
die to just throw fma units at it.

------
DCKing
In July 2015 AMD released a 28nm chip with more GCN compute units than this
card at a lower price - the Radeon R9 Fury X had the full 64 CUs / 4096
shaders enabled for $649. And it was considered a bit of a dud as a product.
This Radeon VII is 2.5 process nodes better, but launches with 60 CUs / 3840
shaders enabled. All still on essentially the GCN architecture. And it
launches at $699, more than three and a half years after the former.

Yet due to sheer clock speed increases and HBM improvements (16GB, 1 TB/s,
wow) they actually seem surpisingly competitive. I'm both incredibly impressed
and incredibly underwhelmed.

~~~
dogma1138
Depends on how you define competitive, in gaming which this card at least
partially is aimed at its about as fast as a 1080ti a card launched 2 years
ago for $700.

~~~
CoolGuySteve
I'm hoping a 8gb or 12 gb version comes out for $500 or so.

16gb seems like overkill for the next couple years of game releases.

~~~
dogma1138
They can’t do 8 GB there are no 2GB HBM2 modules apparently and they can’t cut
the amount of modules without cutting the CUs in half.

It’s essentially just to keep up some production volume this is their HPC die
sold to the consumer market its the Titan V equivalent die of AMD minus the
Tensor Cores just for much much cheaper.

It’s likely going to be a flop gaming wise, 2x8 pin power connectors put it at
300W TDP with performance only matching that of a 1089ti/2080 in the best case
scenarios AMD currently has.

The 1080ti can be had for under $600 if you can find it the 2080 is around
$700 but you get RTX and DLSS which while isn’t that useful yet is going to be
more useful for gaming than 16GB of memory.

~~~
AlphaSite
The 2080 isn't great perf/w either, not like the 1080 was.

------
jyrkesh
> They are presumably not going to be able to match NVIDIA’s energy
> efficiency, and they won’t have feature parity since AMD doesn’t (yet) have
> its own DirectX Raytracing (DXR) implementation.

I assume this means that it won't support raytracing at all? Or will it, but
with some non-DXR implementation?

~~~
dragontamer
NVidia's raytracing is hardware accelerated. There's specific cores that can
ONLY traverse a BVH tree built into the NVidia hardware. Furthermore, NVidia's
raytracing features leverage FP16 matrix multiplication for denoising.

The Radeon VII will certainly be slower at raytracing (if AMD ever decides to
support the feature). It has FP16 support and huge RAM bandwidth (which will
help traverse a BVH Tree), but I would expect it to be many times slower than
dedicated hardware units. Without tensor cores (VII only supports hardware-
accelerated FP16 Dot-products), there's no way it'd keep up with Nvidia on
denoising or BVH traversals.

~~~
CoolGuySteve
Based on the pixel sparkling in the video I've seen if Battlefield, it looks
like Nvidia's raytracing is being used for each pixel. I'm not sure if it has
to be that way or if it's just underoptomozed at the moment.

I think you could get something 95% as good with far better performance by
only tracing vertices and then raster rendering the resultant transform.

Would be a good approach for accurate shadows too where you don't really care
about texture fragments.

~~~
dragontamer
> Based on the pixel sparkling in the video I've seen if Battlefield, it looks
> like Nvidia's raytracing is being used for each pixel. I'm not sure if it
> has to be that way or if it's just underoptomozed at the moment.

1-sample per pixel is already incredibly low for raytracing. A raw 1 spp
raytraced image looks HORRIBLE.

The reason why it looks anywhere close to usable is because NVidia post-
processes all of that noise through their tensor-cores FP16. It is some kind
of spacial / temporal-filter effect (averaging pixels across time, and
space... where "space" is mapped to geometry framebuffers to keep object
boundaries correct).

~~~
CoolGuySteve
Ya that's kind of what I'm getting at.

If your physically accurate rendering is pushing the hardware limits so far
that you need extensive post processing to maintain the framerate then it
stops being so physically accurate.

At that point, it seems like there are more interesting things you can do with
the hardware than a naive raytrace that would look and perform better while,
admittedly, not being 100% physically accurate.

~~~
lostmyoldone
I'd be very surprised if most common usage of RTX would turn out to be
anything even remotely physically accurate.

A more likely use in the immediate future is to simplify some parts of the
rendering pipeline, and improve quality on a few others. Without looking too
much into to it, improving and simplifying soft shadows and are lighting seems
like a no-brainer. Both are really hard to implement efficiently using
rasterization, and/but both needs a good denoiser.

Will be interesting to see thought. I believe it's a bit later than I
predicted that we get to see real time ray tracing on commodity hardware, but
what is a couple of years over a few decades?

------
m0zg
It'd be super-nice if AMD also produced, at a minimum, fully supported
versions of PyTorch and TensorFlow for these. NVIDIA would crap its pants if
perf is comparable (and possibly drop the prices some).

~~~
greatpatton
It is just right here:

* [https://rocm-documentation.readthedocs.io/en/latest/Deep_lea...](https://rocm-documentation.readthedocs.io/en/latest/Deep_learning/Deep-learning.html) * [https://github.com/ROCmSoftwarePlatform/tensorflow-upstream](https://github.com/ROCmSoftwarePlatform/tensorflow-upstream)

~~~
m0zg
Do they post official benchmarks anywhere? I can only find third party
benchmarks of dubious veracity.

------
igravious
If it weren't for the alternatives of AMD† and Linux God only knows for how
long the average Joe would be trapped inside the nVidWintel Borg unit.

† and ARM one day: I'm talking desktop/laptop/server here…

------
vkaku
Good hardware. Wrong time to sell at this price point.

------
jMyles
Perhaps a good candidate for mining coins like LivePeer (one of 8-10 actual
solid blockchain projects IMO).

~~~
hughes
Proof-of-work coin mining is a scourge on the environment and the PC component
market.

~~~
jMyles
Transcoding videos is a scourge?!

I mean, maybe you think their choice to deploy as an ERC20 was a bad choice;
fine. But they're not adding any additional computation to the environment.

I don't understand how it's possible to think that LivePeer is bad for the
environment without thinking the same thing about, for example, Elastic
Transcoder. Your comment reads like a non-sequitur to me.

~~~
wmf
Presumably hughes is just making a knee-jerk comment about how 99% of PoW
systems are doing useless work. Personally I am very skeptical of any system
that tries to do proof of useful work like transcoding; it seems vulnerable to
cheating.

~~~
ericxtang
Hi wmf - I work on Livepeer, happy to clarify on some of the skepticism here.

Livepeer uses cryptographic signatures to prove authenticity of the video
transcoding results, and that signature can be used as proof to punish
cheaters (if you signed the result, you are responsible for the correctness -
and since it's easy to validate whether the transcoding is done correctly,
anyone can use that signature to punish you by taking away your stake). If no
one cheats, no one gets punished. Since the cheater doesn't know how many
people are monitoring the results, their best strategy is to be honest.

~~~
jMyles
Hey ericxtang. I've met your boy Phillip a few times (that's how I found out
about LivePeer) - solid guy. Very good ambassador for your project and all
that it can mean for the world.

