

Nvidia's new mobile superchip - flying_whale
http://www.theverge.com/2015/1/4/7492783/nvidia-unveils-its-latest-mobile-superchip-the-tegra-x1

======
trsohmers
Too bad NVIDIA's (theoretical) FLOP numbers are always single precision (which
is misleading if you are comparing it to supercomputers), as their theoretical
double precision FLOPs are always ~1/4th of their theoretical single precision
numbers. The other problem is that the CUDA shader cores are relatively from
the ARM cores, which adds signifigant latency. While this isn't really a
problem for video rendering and other GPU tasks, this makes it significantly
worse for any sort of processing that has a lot of random accesses (most
compute-heavy workloads). I don't get why NVIDIA tries to brag about compute
performance, which always under delivers compared to what they claim, when
their chips are the best when it comes to what most end users actually care
about... media/video processing.

(Disclaimer: I am founder of a startup,
[http://rexcomputing.com](http://rexcomputing.com), working on a new processor
for high performance computing applications, which would be competing with
this chip for supercomputers, but not in any mobile/consumer tech.)

~~~
stuntprogrammer
In this case, I believe it's 1TF of FP16, or 500GFlops FP32. You're likely
looking at 16GF FP64.

I've also heard, though unconfirmed, that on the CPU side it's quad A57 + quad
A53 rather than Denver derivatives.

~~~
trsohmers
For theoretical FP64 flops, it should be closer to 128GFLOPs. Assuming 10W
TDP, that would put it at 12.8GFLOP/Watt, putting it right there at the
(Double precision) GFLOP/Watt ratio for pretty much all the top chips for 2015
(coming from Intel, AMD, Nvidia, etc). I guess it is nice that it is an SoC,
but I'm not super impressed (then again, I am biased).

Anandtech ([http://www.anandtech.com/show/8811/nvidia-
tegra-x1-preview](http://www.anandtech.com/show/8811/nvidia-tegra-x1-preview))
is reporting that it is a Big.little A57/A53 combo...I'm not completely
surprised that they went with off the shelf ARM cores instead of Denver, but
it makes me think they are putting Denver off to the pasture (especially since
they say they are not going to be going after the HPC or Server markets with
it). Denver has some interesting code morphing (a la Transmeta) tech, but has
in order instruction pipelining. This saves a lot of complexity for the chip,
but decreases performance in most usecases (but NVIDIA claims that it is more
efficient 80% of the time for the average mobile user). Maybe NVIDIA
determined that out of order execution that the ARM reference A57/A53 cores
provide is better for the applications they seem to be targeting with the X1
(automotive).

~~~
stuntprogrammer
For your 128GF I take it you are assuming 1:4 with FP32. That would be
unexpected. E.g. in contrast to AMD parts, the recent NV parts based on
Maxwell2 (such as GM204 in the gtx980) are 1:32 for FP64:FP32. I would expect
that ratio to hold in this part, and the SMs are likely highly similar, giving
the ~16GF number I used.

I haven't seen the automotive workloads, but they strike me as repetitive and
regular enough that a Denver style part would do ok. That said, I'm not
shocked by the use of A57+A53.

(Disclaimer: I worked on an early version of Denver on the code morphing
software, and at an ARM server vendor on A57 based SOC).

------
mschuster91
Last decade's supercomputer(!) in today's pocket. Really, the sheer speed at
which computing technology evolves is mind-blowing. I don't even want to guess
where we are in ten years...

~~~
flying_whale
It's actually quite interesting to think about it. Considering Intel hitting
14nm with Broadwell, there's only a certain limit to which we can push things
or make them smaller. There has to be a point of saturation with things. BUT
that's where the interesting part will be. The new revolutions that spring up
new manufacturing materials which can defy the limits of the current ones.

