

Tim Sweeney: The end of the GPU roadmap [pdf] - simonb
http://graphics.cs.williams.edu/archive/SweeneyHPG2009/TimHPG2009.pdf

======
nvoorhies
One thing that didn't seem to be addressed in the slides themselves is the
reason that GPUs have fixed function texturing operations, a fixed pipeline
stage configuration, etc. It saves on memory bandwidth by making the memory
accesses more coherent and dramatically magnifying the usefulness of
relatively small caches.

Basically this is a claim that there's a massive amount of graphical quality
that will be unleashed once we're unchained from the tyranny of the fixed
function GPU behavior that remains, and that this is worth whatever loss in
computational power we'll have when we increase the sizes of caches to cover
the less-coherent memory access.

I'm not sure if I buy that. Games look quite nice already, and I'm not sure if
I could personally tell the difference between what we see in something like
Crysis and something like Toy Story, which didn't have the shackles of fixed
function pipelines.

An easier explanation that's long been a pet theory of mine is that Larrabee
is essentially a thread parallel FP-heavy processor for the scientific market
motivated by GPGPU's encroachment on that market segment, and necessarily
labeled a "CPU/GPU" in order to not step on toes in the wrong places at Intel.

It's definitely going to be an interesting next couple years in graphics,
though.

~~~
modeless
Pixar has their own style that is intentionally not photorealistic. Also, Toy
Story is from 1995. Try instead comparing Crysis to the effects from a modern
blockbuster, which have to hold up well when compared to the live action
shots.

Ditching the fixed-function pipeline isn't all about realism, though. More
flexibility will allow more experimentation with non-photorealistic rendering.
Maybe we'll see more games with graphics that look like they were painted or
drawn.

Another benefit of more flexible rendering should be easier game development.
Writing a complete renderer from scratch will be hard but most games will use
middleware for that. The most expensive part of a modern game is the art, and
a flexible renderer could do a lot to make generating that art easier.

I do share your pet theory about Larrabee trying to avoid stepping on toes at
Intel by labeling themselves a GPU. However, in order for Larrabee to be
worthwhile for Intel it needs a mass market, and graphics is it. If Larrabee
fails at graphics it will likely not survive.

------
noss
"Load 16-wide vector register from scalars from 16 independent memory
addresses, where the addresses are stored in a vector!"

Wont that cause 16 memory reads that are much costlier than what one can
benefit on a 16-way SIMD ALU operation?

What I wanted when I tried to write highly optimized computer-vision code was
to have multiple "cursors" in memory that i would read from and increment.
I.e. that the hardware would prefetch data and my code would operate as if
data was a stream.

~~~
beza1e1
Your processors fetches a whole cache line at once anyways, so it isn't more
memory reads. The memory address needs to be aligned to 16 byte, though.

~~~
noss
But it says the addresses in the vector are independent. What if they don't
all point into the same cache line? You could have 16 cache misses.

What have I misunderstood?

~~~
jacquesm
yes, you could have 16 misses, but the next n (where n=cacheelinesize/size of
entry) after that will all be hits...

You have to risk that miss at some point. And it does not matter whether you
see the misses one-by-one or 16 in one go, it's the same penalty.

~~~
fhars
A more interesting worst case would be one where the 16 slots take turns in
missing the cache, so that almost everything is read from the cache, but there
is still the penalty for a cache miss between every two SIMD instructions. But
if you write your code in that style you get what you deserve :-).

Even some current architectures [edit: like, for example, x86] have
instructions that give hints to the MMU that the program intends to stream
some part of the memory in the near future, I guess these will rise in
importance when more programs have to avoid these kinds of problems.

------
jacquesm
That's a really great presentation!

One observation though, what is sold as a 'game' today is more of an
interactive movie.

As to the larrabee, I can't wait until it comes out. That chip is going to
give NVIDIA serious headaches.

Lots of good stuff in there, I just skimmed it will go back later to read it
again.

~~~
jwilliams
Larrabee seems very Itanium-like to me... And something I'd never really
considered, but seems like GPUs actually take a lot from the Itanium camp.

Perhaps they were on the right track, but simply too early (afaik one of the
root-cause problems of Itanium was the die size).

~~~
rbanffy
Itanium?!

Itanium is not a many-core processor: it is a VLIW processor.

If the Larrabee can remind someone of another processor, it's the Niagara
family from Sun.

~~~
bvttf
I read "itanium-like" not in terms of architecture, but in market timing,
delivery on performance promises, and scalability for future revisions.

~~~
rbanffy
I think Larrabee is a lot more promising than Itanium ever was. At the bare
minimum, it's a server-like x86 processor.

I would prefer something with a more elegant ISA than an x86, but that's what
we have in front of us now, thanks to the huge mass of non-portable x86 code
that's so critical for success on the desktop...

------
hypermatt
The epic gaming guys always have the most interesting presentations, they seem
more up on more current software techniques like STM and functional
programming, I don't hear a lot of other game companies talk about.

------
akamaka
It's particularly interesting to read Sweeney's views on this subject,
considering that he's been talking about the shift back to software rendering
since 2000 or so, when GPUs started becoming programmable.

Check out this old interview:
<http://archive.gamespy.com/legacy/interviews/sweeney.shtm>

_2006-7: CPU's become so fast and powerful that 3D hardware will be only
marginally benfical for rendering relative to the limits of the human visual
system, therefore 3D chips will likely be deemed a waste of silicon (and more
expensive bus plumbing), so the world will transition back to software-driven
rendering... If this is a case, then the 3D hardware revolution sparked by
3dfx in 1997 will prove to only be a 10-year hiatus from the natural evolution
of CPU-driven rendering._

His timeline was off by a few years, but I think he basically had the right
idea all along.

~~~
miloshh
Note that now he's not saying that GPUs are becoming margunal, it's the fixed-
function pipeline he dislikes. The "software rendering" he calls for will most
likely run on a GPU.

------
peripitea
The return to CPU-based rendering might do wonders for PC gaming's popularity.
It hadn't occurred to me until I read this, but it seems like the decline of
PC gaming has mapped fairly closely to the increase in reliance on top-end
graphics cards. There is probably a huge slice of people at the margins that
have shunned PC gaming (actively or passively) because of the various burdens
that requiring a graphics card adds.

