
Assembler for Nvidia  Maxwell architecture - luu
https://github.com/NervanaSystems/maxas
======
dr_zoidberg
I once found the ISA[1] for the CUDA GPUs, and always kind of assumed Nvidia
provided you with an assembler. However, given that GPGPU programming is not
anything close to my line of work, I never had the chance of checking my
assumption.

Its amazing that he took the time to make an assembler. Also, I'm left
wondering how much performance he tan gain from his tool compared to the
Nvidia supplied toolchain.

[1] [http://docs.nvidia.com/cuda/parallel-thread-
execution/](http://docs.nvidia.com/cuda/parallel-thread-execution/)

~~~
zerohp
PTX is a virtual ISA that is not the same as the machine code that runs on the
device.

In his Introduction[1] wiki page, he says that he studied sgemm
implementations and came to the conclusion that NVidia is not using PTX, but
an assembler for the real ISA which is not distributed to developers. He
claims that his sgemm implementation is almost 5% faster than NVidia's and its
faster than anything that can be done in PTX.

[1]
[https://github.com/NervanaSystems/maxas/wiki/Introduction](https://github.com/NervanaSystems/maxas/wiki/Introduction)

~~~
userbinator
Looking at some of the details in
[https://github.com/NervanaSystems/maxas/wiki/Control-
Codes](https://github.com/NervanaSystems/maxas/wiki/Control-Codes) reminds me
somewhat of the Itanium: a very wide architecture that is capable of high
throughput for specialised applications, but requires a lot of software-level
support to even work correctly (e.g. consider dependencies.) The fact that
it's not well-documented is another similarity.

It would be great if nVidia released more documentation, because chances are
developers could squeeze even more performance out of their hardware that way.

~~~
dr_zoidberg
From what I understand, currently almost all GPU architectures* are some
variant of VLIW, with some "tricks" to get the processing cores to work with
high efficiency (that is, use all the cores available for a given task).

A few years ago Michael Abrash wrote a wonderful article[1] on the Larrabee
project, by Intel, which ultimately resulted not on a graphics card, as he
intended, but on a "processing card"[2][3]. Of course, anything you read by
Abrash in that topic will be more than interesting.

* Intel HD GPUs may or may no be VLIW, I have never been able to find detailed specs of their architecture.

[1] [http://www.drdobbs.com/parallel/a-first-look-at-the-
larrabee...](http://www.drdobbs.com/parallel/a-first-look-at-the-larrabee-new-
instruc/216402188)

[2]
[http://en.wikipedia.org/wiki/Larrabee_%28microarchitecture%2...](http://en.wikipedia.org/wiki/Larrabee_%28microarchitecture%29)

[3]
[http://en.wikipedia.org/wiki/Xeon_Phi](http://en.wikipedia.org/wiki/Xeon_Phi)

~~~
brigade
I was about to say "not really, the only current VLIW GPUs are ARM's Mali" but
after reading more of the documentation I guess you're kind of right...
Maxwell requires explicitly encoding when to dual-issue instructions, which is
basically the defining feature of VLIW.

That said, VLIW is designed/intended for architectures that are highly
superscalar within a single thread of execution. GPUs have eschewed that model
in favor of simply executing more threads. So a better (simplistic) model of
viewing modern GPUs is "AVX-1024 with massive hyperthreading"

~~~
dr_zoidberg
And with that cue, you gave me room to place another article by Michael Abrash
on the Larrabee architecture (which is pretty much what you thought as a
model): [http://www.drdobbs.com/parallel/rasterization-on-
larrabee/21...](http://www.drdobbs.com/parallel/rasterization-on-
larrabee/217200602)

This thread has been wonderful for reading all this amazing things about
different architectures, hardware and software, but I should get working. I
leave with a lot more to read the following weekend! :)

------
codys
In case anyone is wondering, "Maxwell" is Nvidia's current shipping GPU
microarchitecture:
[https://en.wikipedia.org/wiki/Maxwell_%28microarchitecture%2...](https://en.wikipedia.org/wiki/Maxwell_%28microarchitecture%29)

------
thisrod
Good work.

