
KGPU - Augmenting Linux with the CUDA GPU - deutronium
https://github.com/wbsun/kgpu
======
gnu8
Why is this cuda and not OpenCL? There's no reason to legitimize nvidia's
proprietary nonsense, it just enables their bad behavior.

~~~
fool
The GPU code seems to be relatively isolated to memory operations in
gpuops.cu. Not altogether sure but a quick review suggests such mapping is
supported by OpenCL. So one could rewrite that module and the whole thing
would work without CUDA. Of course, in terms of compilers it is going to be a
while before users can move away from nvcc for nvidia graphics card support.

However, as with the parent I'd really like to see a generic OpenCL
vectorization kernel module. Strong suspicion that this work was directly or
indirectly underwritten by Nvidia so I guess someone (Intel?) needs to step up
and fund similar academic projects.

~~~
fool
Suspicion confirmed:

"KGPU is a project of the Flux Research Group at the University of Utah. It is
supported by NVIDIA through a graduate fellowship awarded to Weibin Sun."

<http://code.google.com/p/kgpu/>

~~~
dfc
It really makes you wonder why AMD does not do the same thing; the hw is
essentially free and how much could a fellowship cost. Does anyone know if
research grants like this are a tax write-off?

------
greggman
You do NOT want to do this. Unlike the CPU which can be preempted current GPUs
can not. If you give them 2-30 minutes of work to do they will not return
until that work is done. Windows gets around this by resetting the GPU if it
doesn't respond for more than a few seconds. Linux/osx no such luck, at least
not yet.

But, once reset the state of the GPU is often unknown. Not a good thing if you
are embedding GPU code into your kernel.

------
adamgravitis
Where is this useful? The bus speed across to the GPU is so slow, I thought it
was only meaningful for near-autonomous operations.

~~~
Tuna-Fish
Right now, the bandwidth to modern GPUs is actually pretty decent (16GB/s
bidirectional), but the latency is still horrid. This means that you need
rather large operations for offloading to pay off. I think doing raid-5 or
full disk encryption with large blocks might just _barely_ be worth it.

However, with AMD and Intel integrated GPUs, this is about to change. AMD is
doing a lot of work on HSA, which can be summarized as "GPU and CPU share same
memory, and can communicate by passing pointers". I can see this kind of work
being really useful in the near future.

~~~
dfc
More and more CPUs have AES instructions and my old lenovo ideapad has a
crypto coprocessor. Do you think the GPU offload will be worth it when the
sytem has hw accelerated crypto?

~~~
tjoff
The question should be whether dedicated hardware for accelerated crypto will
be worth it when GPU offload is suitable for it.

Although in that particular context (security) you might enjoy the isolation
of dedicated hardware as opposed to sharing it with others. The GPU solution
though of course has the advantage of being able to adapt to new ciphers etc.

~~~
dfc
Adapting/implementing the newest and hottest cipher on the block is not
something that the crypto community advocates. Do you really think crypto
accelerated hardware is going to fall behind and not support the ciphers that
the crypto community (academia/industry) endorses?

~~~
tjoff
I've been waiting for what, a decade, since VIA introduced padlock to get
hardware accelerated encryption in mainstream CPUs. And, just recently, basic
support have been introduced but only for AES and nothing else (and if I'm not
mistaken (probably is) the padlock is vastly superior to the offerings of AMD
and intel :P).

So yes, crypto accelerated hardware is behind and does not support the ciphers
that the crypto community (academia/industry) endorses and in all likelihood
will never bother catch up since doing it on the GPU will be good enough. Even
if it takes another decade.

~~~
dfc
What algos are you missing?

The AES-NI instruction set was proposed in 2008 and the first intel cpus
started shipping almost three years ago.[1] Soekris has had the vpnXXXX crypto
accelerators since as long as I can remember.[2]

[1]
[http://ark.intel.com/search/advanced/?s=t&AESTech=true](http://ark.intel.com/search/advanced/?s=t&AESTech=true)
[2] <http://soekris.com/>

~~~
tjoff
Blowfish or twofish wouldn't hurt. But I'd be happy with AES, too bad none of
my devices have hardware acceleration for it.

The fact that it was proposed in 2008 is quite telling by itself. And when
intel introduced it it was in their high-end product lines, to find a
processor where AES-NI is less needed would be a challenge.

A suitable integrated GPU would penetrate the market much better and
ultimately support products which today use the atom processor. The very same
product segment where you barely can use encryption on today (in contrast to a
i7 which saturates a fast ssd without breaking a sweat in AES-encryption
throughput - without hardware acceleration).

A similar solution would also most likely allow me to encrypt files on my
phone without a large impact, even if the manufacturer couldn't care less
about security features.

------
mtgx
Would this be possible with OpenCL, too?

------
odranoelson
I wonder whether this won't conflict with other user applications using the
GPU.

------
wavesum
I don't understand :)

Could someone explain what this kind of technology means in practice? Does
this mean I can GPU accelerate my old code?

