

KGPU — Accelerating Linux Kernel Functions with CUDA - Tsiolkovsky
http://code.google.com/p/kgpu/

======
onan_barbarian
This is almost certainly nuts.

The latency to and from GPUs is awful. I've been hacking GPGPU since before
CUDA existed (worked with an early beta of CUDA and the GTX 8800). You can do
some great throughput-oriented stuff there, but the latency issues meant that
GPUs were useless for small-to-medium tasks. It's partly an issue of latency
and also an issue of getting 'enough data to make parallelism useful' - we had
a pattern matching task that hit peak throughput at about 16K threads which
required, of course, a big pile of data.

Things may have improved in terms of latency since then, but we're talking
multiple orders of magnitude off (back then) for network processing tasks,
much less kernel tasks. And the issue of needing 'big piles of data' to work
over isn't going away. This is algorithm-dependent, of course, but lots of
data is the easy way to find data-parallelism. :-)

Most of the papers published about GPGPU are either larger data sizes (where
GPGPU is a legitimate tactic) or do some really serious handwaving concerning
latency (e.g. the GPU routing stuff, 'packetshader'). Just because there are a
bunch of people who can get interesting papers out of it doesn't mean it's a
good idea.

------
dougb
Sandia uses GPUs for calculating parity in software RAID,
[http://www.computer.org/portal/web/csdl/doi/10.1109/ICPP.201...](http://www.computer.org/portal/web/csdl/doi/10.1109/ICPP.2010.64)

------
goalieca
Well, other than crypto what is there really to do that is more efficient in
the GPU?

~~~
wmf
Routing: <http://shader.kaist.edu/packetshader/>

This was really non-obvious to me.

~~~
onan_barbarian
It's non-obvious because it's a bad idea. This paper comes from a wacky world
where latency and power consumption don't matter. The comparisons between CPU
vs. GPU aren't that compelling just on the surface of it. The latency/power
consumption numbers (compared to dedicated ASICs for this sort of thing) are
just laughable.

Being the most compelling 'software router' is sort of like being the 'tallest
midget' but even in this domain, I think their alleged advantages over CPU-
only are mainly due to carefully massaging the presentation of the data.

------
IvoDankolov
Since I first heard about CUDA and ran the N body simulation with hundreds of
bodies in real time on a cheap GeForce 8500 I became aware of how good of a
parallel processor these things are.

Of course, had I learned about how the rendering pipeline for graphics works
it would have been obvious, but that only came later.

After that, I've always wondered how feasible it would be to write an OS for
regular users (there are CUDA supercomputers, but that isn't very
representative of how most people make use of computers) that makes use of the
GPU for various computations other than graphics. Hopefully, this project will
shed some light in that direction.

------
rbanffy
What are the differences between using CUDA and OpenCL in this scenario? I was
under the impression CUDA is Nvidia-only.

~~~
wmf
Yeah, basically OpenCL is a standardized version of CUDA.

------
powertower
Matrix multiplication is the only thing GPUs are useful for... Basically,
parallel mathematical operations.

It's interesting, but the applications to this are seriously limited and very
specific.

Graphic apps already use the GPU. The OS mostly keeps data structures and
calls functions.

~~~
michael_h
That may be a bit harsh. If your input size is sufficently large, I would bet
that encryption would get a pretty big boost. I know convolution operations
are a lot faster (not sure where this would happen in a kernel though).

I think people often forget to factor in data transfer time. Matrix
multiplication is 98x faster on my gpu than on my cpu, but I don't actually
break even on real world time until the dimensions are up around 2000 or so.

~~~
woodson
True. A talk I recently attended quoted results of a Gammatone filterbank
implementation for audio processing using CUDA (which was properly
parallelized) and compared it with IPP (Intel Integrated Performance
Primitives) on the CPU, which turned out to be a lot faster due to the cost of
transfering the data to the GPU in the CUDA implementation.

------
nwmcsween
Honestly I don't like Linux, FreeBSD, Windows.. etc where everything under the
sun is stuffed into a privileged domain and is stagnating to the point where
hardware common 10 years ago is just becoming useful w.r.t the kernel. I can
only hope that something new will come along and simply multiplex the hardware
and have abstractions as a 'library' without having to hack through n
subsystems to add feature x.

~~~
jrockway
What's your plan for helping with this effort?

~~~
nwmcsween
Eventually finish working on a programming language that can facilitate this.
Probably create something similar to MIT's old exokernel research project.

~~~
alecco
Internet tough _programmer_?

