
GPU Ray Tracing the Wrong Way - ingve
http://www.joshbarczak.com/blog/?p=1197
======
fulafel
Makes you wonder how many cool things people would have written for the native
Intel GPU instruction set if there existed supported interfaces for it. All
the programming information has been out there in Intel-provided open source
drivers for many many years.

~~~
nn3
I don't think that person wrote his own assembler, so there must be existing
assembler toolkits for GEN already.

Some googling finds [https://software.intel.com/en-us/articles/introduction-
to-ge...](https://software.intel.com/en-us/articles/introduction-to-gen-
assembly)

~~~
exDM69
He does mention writing an assembler called HAXWell, and there are code
snippets of it in the article.

------
wyager
Not to be pessimistic, but that was a _lot_ of effort to get what looks like
~25% improvement over just using the CPU. Is this a typical performance
profile for doing ray tracing in OpenGL? I expected tens of times faster than
a CPU on commodity hardware.

~~~
wongarsu
> Is this a typical performance profile for doing ray tracing in OpenGL?

Ray tracing on the GPU is usually a lot faster than on the CPU. The problem is
that the post is comparing the performance of a Intel Core i3-4010U with the
performance of that CPU's integrated GPU, an Intel HD 4400.

Some benchmarks: A GeForce GTX 1080, as a current high-end GPU, has a passmark
score of 12,618, a Intel HD 4400 has a passmark score of 546. In the 3DMark11
benchmark, a GTX 1080 reaches 24390 points, the used Intel HD 4400 reaches 740
points.

So what we're really seeing is that using a pretty bad GPU, designed by a CPU
manufacturer, is barely faster than using the CPU. But using that GPU as if it
was a CPU is notably faster than using it via a graphics shader interface.

~~~
jrk
I quibble with your label of "bad GPU," though the relative magnitude of
resources and performance is right. Modern GEN is actually a good-to-great GPU
architecture (an assessment widely shared in the GPU architecture community
today), it's just not given many resources compared to a large discrete GPU,
especially in lower-end SKUs like this.

Work like this is interesting because it digs into ways—at the architectural
level, well below the standard programming models of OpenGL/D3D/OpenCL—that
GEN is a potentially _more_ efficient general purpose programming target than
NVIDIA architectures, hamstrung by a commitment to a programming model
(OpenCL) which is strongly tied to an NVIDIA-style execution model.

