
HIP – Convert CUDA to Portable C++ - reimertz
https://github.com/ROCm-Developer-Tools/HIP
======
waynecochran
When I write CUDA kernels I am very specific in optimizing for nVidia GPU
features:

    
    
         * I have 1024 threads per block
         * I have 48KB of shared memory per block
         * I have 32 threads per warp and need make sure that 
           global to local memory reads are coalesced.
         * Think SIMD and avoid branching as much as possible
    

My kernels usually follow a specific pattern:

    
    
         (1) Read global memory into local memory: making sure that
             if thread i reads memory[n], then thread i+1 
             reads memory[n+1].
         (2) __syncthreads().
         (3) Do computation in in most thread balanced way possible.
    

This very specific pattern doesn't really work elsewhere. In fact, optimizing
in this fashion and then porting to C++ and elsewhere loses the specific
optimization. Programming in more general way loses all the things that makes
the program fast. Anyway, I definitely going to look this over more.

~~~
dragontamer
AMD's "HCC" is C++ with templates added onto it. Unlike CUDA, HCC implements
all relevant features in C++ Templates alone.

If you simply do 64-thread per "warp" (AMD's groups are per-64) and 32KB LDS
aka Shared memory per block, you would be able to write portable high-
performance code between AMD GPUs and NVidia GPUs.

AMD seems like its a bit behind with regards to GPGPU adoption. But AMD's
hardware seems to be a good bit cheaper. You can get HBM2 models at ~$2000
from AMD (Firepro WX9100, which is Vega architecture)

Although... as they say... hardware is cheap. I'm sure most datacenters will
prefer the $8000 NVidia V100 instead, because there are more people using that
hardware. In particular, its easier to get started with a V100 due to AWS and
other cloud-compute offerings.

------
tromp
I wonder if this performs well enough to port cryptocurrency miner software. I
hope to find out with

[https://github.com/mimblewimble/grin/issues/806](https://github.com/mimblewimble/grin/issues/806)

------
Wehrdo
Could anyone who's used HIP comment on how it compares to programming in raw
CUDA?

I considered using it for a project recently, but ultimately decided against
it because I didn't need to be able to run on AMD systems.

~~~
waynecochran
Wait ... if you write your program in HIP you get both CUDA and OpenCL (which
includes AMD) for free right?

~~~
dragandj
There is no OpenCL in the story there. AMD created HIP as a direct copy of
CUDA. It then gets compiled to work on AMD hardware, or can be translated to
CUDA.

AMD _also_ supports OpenCL (which I prefer to both CUDA and HIP), but it's not
connected to HIP.

~~~
TomV1971
Last time I checked, HIP didn’t support reading from textures.

That’s something that’s not only useful for pure graphics.

So it is (or was?) not a straight substitute for CUDA.

------
ipunchghosts
The fact that HIP exists is why we choose cuda for our program 8 years ago.
Opencl and cuda were the choices. I thought about how I would design such a
gpu language aND then looked at cuda and opencl. What I dreamt up matched cuda
exactly. I forsake one day that it would no longer matter what you wrote in
cuda because it could be easily translated to any type of gpu hardware. In the
past few years, Portland group made cuda do cpu compiler and now HIP.

~~~
aerodog
*foresee:)

~~~
mastazi
Mmmhhh I would say "foresaw" ;-)

~~~
ipunchghosts
Sent from my Android.

------
mamon
So, the logical next step would be what? Apply that tool to Tensorflow to make
it run on AMD GPUs?

~~~
TomVDB
HIP is only useful to convert CUDA source code.

Tensorflow uses the cuDNN library, which is closed source. There is nothing
for HIP to convert.

~~~
tgtweak
AMD is working on a cudnn comptability layer iirc (MiOpen) and is ROCm group
has created a cuda transpiler (to intermediary HIP then to amd binary via hcc)

[https://instinct.radeon.com/en/6-deep-learning-projects-
amd-...](https://instinct.radeon.com/en/6-deep-learning-projects-amd-radeon-
instinct/)

~~~
mastazi
Wait, I thought that AMD's cuDNN replacement was hipDNN, now I'm confused
[https://github.com/ROCmSoftwarePlatform/hipDNN](https://github.com/ROCmSoftwarePlatform/hipDNN)

Edit: OK I was reading the docs, I think I got it: hipDNN is a wrapper that
(once finished) will search and replace calls (from cuDNN to hipDNN), then
hipDNN itself, in turn, will call MIOpen, not sure if that's right, I would
appreciate if someone who knows more could confirm

~~~
nl
AMD is always working on random half-assed attempts to get to parity with
NVidia for neural network training.

If they'd chosen one approach 5 years ago and put decent resources behind it
they might be competitive by now.

------
petermcneeley
Is there a legal question here as CUDA is proprietary API?

~~~
O_H_E
That is an interesting question, but I don't think so, because that repo
shouldn't contain anything related to the CUDA binaries.

(Do NOT take my word for it, I have no idea about what I am talking about)

~~~
petermcneeley
Obviously im thinking of this type of legal issue.
[https://www.theregister.co.uk/2017/06/09/intel_sends_arm_a_s...](https://www.theregister.co.uk/2017/06/09/intel_sends_arm_a_shot_across_bow/)
. Emulation is simply an implementation of API.

~~~
bringtheaction
Also Oracle vs Google in their legal battle about the use of Java APIs in
Android.
[https://en.wikipedia.org/wiki/Oracle_America,_Inc._v._Google...](https://en.wikipedia.org/wiki/Oracle_America,_Inc._v._Google,_Inc%2E)

Google won out over Oracle in the end but it took a long time — from 2012
until 2016; four years of court cases — with some courts finding that API
structures were copyrightable and others found that they were not, or that
reimplementation was fair use. I guess we now have a precedent thanks to this
but it could still be an issue? IANAL so I don’t know.

~~~
cma
Oracle may not have been able to use SQL anymore if the ruling had gone in
their favor.

~~~
O_H_E
Could you elaborate???

~~~
bitL
Somebody speculated that the lawsuit was used by Oracle to establish firm
boundaries in what can be considered a copyright infringement, with the intent
to implement Amazon/GCE APIs for drop-in compatibility for their cloud
offering. So they tried all kinds of ridiculous stuff to see what works/what
doesn't in order to properly cover their backs while "stealing" other APIs.

~~~
pjmlp
Given that Google helped sink Sun by ripping them off, and Sun wasn't in the
position to pay for lawyers, to apply the same medicine that they did to
Microsoft....

James Gosling interview, at 57:42

[https://www.youtube.com/watch?v=ZYw3X4RZv6Y](https://www.youtube.com/watch?v=ZYw3X4RZv6Y)

"unwilling to help us pays the bills", so nice for the Do No Evil company.

Oracle does pay for their ANSI/ISO SQL certifications.

