
Tegra K1 “Denver” Will Be First 64-bit ARM Processor for Android - pdknsk
http://blogs.nvidia.com/blog/2014/08/11/tegra-k1-denver-64-bit-for-android/
======
DCKing
What a peculiar design. They went for an in order design instead of the out-
of-order design that has been standard in high-end ARM cores for years. Out-
of-order usually brings more instructions per clock cycle ("more efficient"),
but makes the chip more power hungry and makes it harder to achieve a high
clock speed.

So the in-order design they have now allows them to scale up to 2.5 GHz which
is very high for such a constrained chip. They also have this weird "code
optimizer" that appears to sacrifice some RAM and a CPU core to dynamically
reorganize processor instructions to run better on this in-order design. It
seems as though they want to have their cake and eat it too, but at the same
time they appear to paradoxically introduce complexity to achieve simplicity.

I wonder if someone more knowledgable about chip design could comment on this
design?

~~~
_delirium
One of the more interesting bits of speculation about the design's motivations
that I've seen so far is an anonymous comment on the Slashdot version of this
story:
[http://hardware.slashdot.org/comments.pl?sid=5519563&cid=476...](http://hardware.slashdot.org/comments.pl?sid=5519563&cid=47653679)

~~~
gcp
This is another good post, despite the appeal to authority:
[http://hardware.slashdot.org/comments.pl?sid=5519563&cid=476...](http://hardware.slashdot.org/comments.pl?sid=5519563&cid=47654085)

Saying "When an in-order CPU stalls on memory, it's still burning power while
waiting, while an OOO processor is still getting work done" seems deceptive,
though. The OOO processor is (often) doing speculative work, which may be
unneeded, and a stalled in-order CPU won't be using quite the same amount of
power as if the execution units were actually switching.

Though I'm pretty sure the NVIDIA design must speculative prefetch and hoist
memory reads aggressively to be performance competitive (see the huge cache
sizes!), which also burns power.

------
walterbell
Background on Project Denver, including x86 vs ARM ISA,
[http://semiaccurate.com/2011/08/05/what-is-project-denver-
ba...](http://semiaccurate.com/2011/08/05/what-is-project-denver-based-on/)

"Denver is just one of the variants in that line. T50 was going to be a full
64-bit x86 CPU, not ARM cored chip, but Nvidia lacked the patent licenses to
make hardware that was x86 compatible."

~~~
voidlogic
I'm surprised nVidia didn't buy VIA, or at least Centaur Technology from VIA,
to make this happen.

------
zurn
Neat. On Android it's easy to believe Dalvik/ART generated code leaves more
performance on the table compared to other platforms.

Another "what's different this time" point: compared to Transmeta era, CPUs
now typically have idle core(s) waiting for work to do. This may enable more
ambitious optimizations as you're not constantly stealing cycles from the app.

Does anyone have a link to the paper?

~~~
higherpurpose
Why do you lump Dalvik and ART together? The whole point of ART is that it's
compiled to native code, is it not? Therefore it's "not leaving anything on
the table".

Not sure why I'm being downvoted:

> _The big paradigm-shift that ART brings, is that instead of being a Just-in-
> Time (JIT) compiler, it now compiles application code Ahead-of-Time (AOT).
> The runtime goes from having to compile from bytecode to native code each
> time you run an application, to having it to do it only once, and any
> subsequent execution from that point forward is done from the existing
> compiled native code._

[http://www.anandtech.com/show/8231/a-closer-look-at-
android-...](http://www.anandtech.com/show/8231/a-closer-look-at-android-
runtime-art-in-android-l)

That sounds like native code compilation to me.

~~~
fulafel
Dalvik and ART both generate native code. ART does it at app install time and
Dalvik, being a JIT, does it at app runtime. He was speculating about quality
of the generated native code, not its existence.

Google has been very quiet about ART generated code quality. I haven't found
any benchmarks that would compliment its performance either. It's probably
better than Dalvik but worse than mature optimizing compilers.

~~~
on_and_off
I don't think they have been very quiet about ART. They had a whole I/O
session and they recently dedicated a whole dev backstage podcast to ART and
its future evolution (btw, some nice things are planned : suppression of the
65k limit, code hot-swap, ...).

------
chrissnell
Oh, Transmeta. Back in the heyday of Slashdot, Transmeta had a massive cult
following for many months before the product was even released. I remember the
dissapointment when they did finally ship. My dad bought one of the little
Sony laptops with their proc and it was a dog (although power-efficient).

~~~
SixSigma
That was because Linus Torvalds did a keynote speech with live streaming when
he was employed there, he was pretty stoked himself.

The only detail I remember from it was :

"Linus, what Linux distribution do you use?"

"Debian, of course"

I wonder if he still uses that on his Mac Book Air

~~~
gcp
Whenever one of the most well known open source technologists joins a (pretty
secretive) firm, it's going to generate interest and speculation.

Also remember John Carmack joining Intel to work on Larrabee.

~~~
wmf
Maybe you're thinking of Michael Abrash.

------
cheald
I asked it when apple announced 64 bit support for the iPhone, and I'll ask it
again now: what is the point in 64 bit cpus in devices that will be deprecated
and replaced before we get around to having more than 32 bits worth of
addressable ram in these devices?

These aren't $1500 desktops with user upgradable components. They are sealed,
non upgradable, relatively disposable devices that are intentionally obsoleted
by their manufacturers every 12-18 months. It makes for a nice marketing
bullet point, but are we actually getting anything besides a bigger number to
impress the impressionable customer with?

~~~
terhechte
It's not only about memory:

"In short, the improvements to Apple's runtime make it so that object
allocation in 64-bit mode costs only 40-50% of what it does in 32-bit mode. If
your app creates and destroys a lot of objects, that's a big deal." [1]

One example: in 64bit NSNumber and (short) NSString objects on iOS can be
stored completely in a pointer (tagged pointers) on the stack without having
to create anything on the heap. That's possible because the size of a 64bit
pointer is large enough to contain the required information. Creating and
accessing one of these objects becomes far faster. Based on what I gathered at
WWDC this year, Apple is also inclined to move as many various objects there
as possible (as long as they can be stored in a 64 bit pointer).

[1] [https://www.mikeash.com/pyblog/friday-
qa-2013-09-27-arm64-an...](https://www.mikeash.com/pyblog/friday-
qa-2013-09-27-arm64-and-you.html)

~~~
DCKing
It is worth noting that this is likely to be an iOS-on-AArch64 specific tweak.
Apple doesn't do this on x64 OS X (right?).

Furthermore, nothing has been said (AFAIK) about similar tricks Google is
doing with Android on AArch64 or 64 bit architectures in general. From what
I've heard they just made the Android runtime capable of emitting AArch64 (and
x64, MIPS64) instructions instead of 32-bit ARMv7 ones.

That alone gives plenty of a performance boost as AArch64 instructions are
supposedly quite a bit faster than the old 32-bit ones. It remains to be seen
though whether Google can make as much use of clever allocation tricks as much
as Apple can: Android is much more platform agnostic and uses a garbage
collector for some of these tasks.

~~~
pohl
There is nothing iOS specific about it. MacOS X has been benefiting from this
optimization since 10.7.

~~~
DCKing
I stand corrected. So it's a feature of Apple's runtime on all 64 bit systems.

------
shmerl
I really hope Nvidia will open up their K1 driver completely following Intel's
approach.

~~~
pjmlp
Won't happen.

Intel 3D hardware is crappy anyway.

~~~
fulafel
Less crappy than many people think. Intel and AMD are about head to head in
game benchmarks. See eg. [http://www.anandtech.com/show/7677/amd-kaveri-
review-a8-7600...](http://www.anandtech.com/show/7677/amd-kaveri-
review-a8-7600-a10-7850k/12)

~~~
pjmlp
The ones that have to program for them beg to differ:

[http://richg42.blogspot.de/2014/05/the-truth-on-opengl-
drive...](http://richg42.blogspot.de/2014/05/the-truth-on-opengl-driver-
quality.html)

~~~
brigade
He's programming for 200W+ dedicated AMD cards. It may surprise you that AMD
can't match that level of performance in their 65W SoCs either.

~~~
pjmlp
Last time I checked, even AMD's Brazos were better than Intel's offering.

------
jbarrow
Considering the Tegra K1 is the first CUDA-capable mobile processor, does
anyone know if we'll be able to leverage CUDA bindings on devices that run on
this?

~~~
cvs268
Yes! CUDA on ARM has been available for quite some time now.
[http://devblogs.nvidia.com/parallelforall/cuda-arm-
platforms...](http://devblogs.nvidia.com/parallelforall/cuda-arm-platforms-
now-available/)

In fact the most recent version of OpenCV already leverages this on ARM
platforms that support CUDA.
[http://code.opencv.org/projects/opencv/wiki/CARMA_platform_c...](http://code.opencv.org/projects/opencv/wiki/CARMA_platform_compilation_and_testing)

------
msh
They says first 64bit arm for android, I thought that Qualcomm and mediatek
already have 64bit arm CPUs?

~~~
masklinn
Qualcomm has 64b chips, but AFAIK there's no Android device using them in the
wild. The HTC "A11" was announced yesterday and will apparently sport a
Snapdragon 410, I don't know that other devices exist yet.

Of course there's no device using Denver in the wild either.

------
trevyn
Will this ship before the MediaTek MT6752/MT6795? They're claiming later this
year, too.

------
masklinn
Am I the only one weirded out by the supposed Anandtech quote figuring nowhere
in the linked article, and said article being nothing more than the announce
and display of some NVidia marketing material?

