
Ubershaders: A Ridiculous Solution to an Impossible Problem - voltagex_
https://dolphin-emu.org/blog/2017/07/30/ubershaders/
======
et2o
What an awesome writeup. I am not even really personally invested in the
problem as I don't play older video games, but I loved reading the story. I
wish other open source projects would do similar writeups when they reach
major accomplishments.

If I just saw "specialized shaders replaced with ubershaders" on a feature
update, I probably wouldn't think there was much of a story to it.

~~~
richdougherty
I'd love to know how this project manages to get such high quality stories out
for every update. I assume their project has a community member who is
passionate about writing. It's such a rare, but useful thing to have someone
volunteer their time to do great technical writing. I wish someone would do it
for the open source projects I work on!

~~~
ClassyJacket
It's an incredibly impressive project. I can't believe they're not on Patreon
(maybe this type of project doesn't qualify?). They're more high quality with
their updates and the writeups about their updates than most commercial
software made by professionals getting paid.

~~~
jsheard
The Dolphin project doesn't accept donations in any form, IIRC they said that
fairly splitting the money between the contributors would be too difficult.

Apparently their infrastructure costs are covered by the few ads on the site
and they're happy to leave it at that.

~~~
im3w1l
Wow, that's a decision I deeply respect.

------
quotemstr
This is an amazing article. I _love_ technical problems for which the
prevailing consensus moves from "this isn't a problem" to "this problem is
impossible to fix" to "the proposal fix could never work" to "doing it right
would be too much work" to "the solution was inevitable".

~~~
hossbeast
Can you name/link to some other examples?

~~~
nbarbettini
SpaceX spending almost a decade developing, testing, and then deploying
rockets that can land and then be reused is a good example.

------
AceJohnny2
Is there any higher accolade in the field than having John Carmack say
"Dolphin updates are wonderful system engineering articles." ?

[https://twitter.com/ID_AA_Carmack/status/891803321777897472](https://twitter.com/ID_AA_Carmack/status/891803321777897472)

------
davidmurdoch
My favorite part: "Despite being around 90% complete, the last 90% still
remained to be done"

~~~
ubernostrum
This is a rather old joke in software:

[https://en.wikipedia.org/wiki/Ninety-
ninety_rule](https://en.wikipedia.org/wiki/Ninety-ninety_rule)

~~~
davidmurdoch
I've heard it, and variations of it, many times before, but never realized it
went back that far.

------
bananaboy
This is really great stuff from the Dolphin team!

We took a similar approach when building de Blob 2 for Wii, X360, and PS3. We
defined all our materials in terms of TEV stages. On the Wii that was used to
set up the TEV when rendering. For X360 and PS3 we had an ubershader that
emulated the TEV stages. This made it much easier for the artists; they built
all materials in one tool in terms of what are essentially register combiners.
We also allowed them to create more complex materials for X360/PS3 that would
override the base material and do things that the Wii didn't actually support.

------
rrradical
"Over the past few years, we've had users ask many questions about shader
stuttering, demand action, declare the emulator useless, and some even cuss
developers out over the lack of attention to shader compilation stuttering."

Ugh, it pains me to imagine users that would be anything but appreciative
towards these developers, but kudos to the devs for using that abuse as
inspiration.

------
the_mitsuhiko
> macOS Graphics Drivers are Still Terrible

There it is again :(

~~~
eridius
I assume Dolphin doesn't have a mode that uses Metal? Because that would
presumably make it work well on macOS, as Metal is where Apple's been focusing
their efforts for a while now.

~~~
derefr
I'm curious how similar Metal is to Vulkan in API-surface terms. Would it be
easier to develop a Metal backend for Dolphin by starting from the macOS
Vulkan backend than by starting from scratch?

------
chris_wot
Dear God, the solution _is_ insane! That is mind-blowing... emulating the
whole pipeline?!?!

People who do emulation are, quite simply, the very, very best of us.

My other take away is: just don't bother getting an Nvidia card if you can
avoid it.

~~~
phire
It doesn't quite emulate the whole 3D pipeline, but it emulates the entire
pixel and vertex stages.

If anyone is interested in checkout out the massive ubershdaers, I've stashed
a copy here:

[https://gist.github.com/phire/25181a9bfd957ac68ea8c74afdd9e9...](https://gist.github.com/phire/25181a9bfd957ac68ea8c74afdd9e9e1)

~~~
chris_wot
So if I read the article right, this shader emulates those parts of the
rendering pipeline of the GameCube/Wii... which to my mind still just sounds
absolutely amazing - does this mean you've had to implement any quirks of the
devices into the shader?

Also - does this generalise to other rendering pipelines for other devices do
you think?

~~~
phire
Yeah, Though we had already implemented all those quirks in the old generated
shaders.

The main difference with ubershaders is skip the shader generation/compilation
and directly interpret the raw shader binary .

    
    
        > Also - does this generalise to other rendering pipelines for other devices do you think?
    

Modern shader cores are more or less Turing complete, so you should be able to
do the same on any other rendering pipeline which fits into 'opengl pipeline
model', including other modern GPUs.

Though, while it might be possible to run modern shaders in this manor, it
won't run fast enough, because you will have to spill some (or even a lot) of
the shader's state to the host GPU's main memory.

Ubershaders work well for dolphin because the entire mutable state of the
GameCube's pixelshaders fit into the available registers, with plenty of space
left over. I assume it will work well for other DirectX 8 or even DirectX 9
era GPUs.

------
lordleft
This was a really well written overview of a technical puzzle and it's
eventual resolution. Loved the lucidity of the prose!

------
sltkr
Fascinating read!

The final approach of interpreting the shaders initially, while compiling them
in the background for greater performance, sounds very similar to what just-
in-time compilers do.

If you think about it, the problems they face are also kind of similar: both
systems are confronted with unpredictable, dynamic "source code", and both
want to achieve both high performance while avoiding the lag introduced by
ahead-of-time compilation, so it makes sense that a similar solution approach
might work.

------
Fiahil
I never thought people would be working full time on emulator projects. I
guess I really underestimated the amount of work going there.

~~~
thebigjc
The CEMU patreon is quite well funded:
[https://www.patreon.com/cemu](https://www.patreon.com/cemu)

------
FRex
Call me old fashioned or stupid (just not nostalgic, the best I ever owned was
Pegasus, a hardware clone of a NES/Famicon) but whenever I see these issues
with older Sony or Nintendo stuff I a in awe.

Today's consoles seem like repacked PCs with few changes but the older ones
seem like actual dedicated gaming hardware, especially PS2 with Emotion Engine
and PS1 as disk controller, what the hell (in a good way)?!

~~~
rybosome
I'm not an expert, but that's my understanding as well. I believe that's one
of the reasons that exclusives were far more common in those days; a port was
not a minor engineering effort, you had to do a total rewrite.

What I don't understand is why this was the case. I wonder if a general-
purpose-PC-like architecture that was powerful enough to play games of the
intended caliber was simply too expensive at the time.

~~~
fragmede
Yes - during the NES era, a reasonably powerful computer for games cost closer
to $5k while the NES was closer to $100.

~~~
FRex
I'm talking more of PSX and PS2 era.

Some games on PS2 like Fatal Frame or Haunting Grounds are impressive even by
today standards and could pass for double A games nowdays (entire 17 years
later). That's just impressive. And their hardware specs read like a real spec
for a gaming machine (the EE's 2 VPUs, the PSX in PS2 for compatibility,
etc.), not just "bunch of PC CPUs and GPUs from AMD in a box + blueray drive".
Ironically, first original XBox prototype was actually (I think or maybe its a
rumor) made out of laptop components.

NES was a bit weak in comparison but very cheap (Pegasus costed like 50 PLN in
the early 2000s).

~~~
derefr
In the PSX/PS2 era (or really, the era starting from the SNES's SuperFX chip),
the dedicated graphics ASICs in consoles, combined with the fact that those
ASICs were being targeted individually at a low-level by game devs, were
putting out results that seriously outpaced what you'd expect out of your PC's
3DFX Voodoo card.

That wasn't because the designs were more _clever_ , mind you; but just
because the hardware designers didn't need to think in terms of an
architecture that contained concepts like dynamic frequency scaling and multi-
monitor support and a kernel that blocks on disk IO. Consoles were hard real-
time embedded systems, and the games were their unikernels; well into the PS2
era, console game were still relying on VBlank interrupts for physics timing!

And what this got you, was effects that were only achievable on an $8000 SGI
workstation, for $300. Slightly-beyond-state-of-the-art, for cheap. But in
exchange, it forced heavy consolidation in the console manufacturer market,
because developing that specialized hardware wasn't cheap (like it was back in
the 8-bit micro era.)

But "generic" PC GPUs eventually started scaling in power geometrically, to
the point where the specialization and hard real-time guarantees just weren't
_needed_ any more to achieve modern graphics cheaply. The low-level-targeted
specialized-graphics-ASIC technique wouldn't be of much benefit today, because
six months later there'd be another new "generic" GPU out that could do what
that ASIC does without breaking a sweat.

The same thing happened in networking: ASIC switches with special RTOSes were
needed to run data centers—until CPUs and PCIe advanced enough to take over.
Now everything (except Internet-backbone gear) is Software-Defined Networking,
i.e. generic boxen running VMWare EXS running a Linux/BSD router VM.

------
slaymaker1907
Why not take a profiling approach and cache the configurations rather than the
compiled shader? You could then compile them on startup. By caching the
configurations, you could then share this data between hosts and don't have to
invalidate them as often.

~~~
specialist
Could they use the "cut scenes" (?) to prime the shader cache(s)?

I mean those bits of animation between levels. I understood that many
animations are scripted, vs movie recordings.

Apologies if I'm using the wrong words. I don't play many games.

~~~
zeta0134
In a traditional game this is exactly how it's done. Dolphin is a bit
different though, because it's an emulator. It is aware that a game is using
the GPU, but isn't aware of the inner workings of each game's logic. Dolphin
doesn't know the difference between a cutscene, gameplay, a pre-rendered
movie, or a credits sequence. Because of this, it has to be written in a more
general manner, so that a game running inside of it can ask it to do anything
a Gamecube or Wii would be able to do, and still come up with the right
result.

------
bpicolo
The Dolphin project always has amazing writeups for complicated technical
problems. Really love these. Amazing work from that whole team

------
misingnoglic
It always amazes me how dedicated and talented the engineers who work on these
projects are, amazing :)

------
Nican
The dolphin emulator's blog is doing awesome blog posts as usual. Reminds me
of how JavaScript compilers[1] also compile several versions of the same
function, as the interpreter gains more insight on how the function is used.

[1] [https://wingolog.org/archives/2011/07/05/v8-a-tale-of-two-
co...](https://wingolog.org/archives/2011/07/05/v8-a-tale-of-two-compilers)
(Wow, 2011, time goes by fast)

------
ginko
This approach actually seems more straightforward and easier to maintain than
the original shader cache system. Of course when dolphin was originally
written this wasn't feasible on hardware at that time, but nowadays I'd say
shaders of this complexity aren't that unusual.

------
spondyl
Dolphin always do great writeups and this is no different. Real nice!

------
br1
Reminds me of [https://01.org/fast-ui-draw/blogs/krogovin/2016/fast-ui-
draw...](https://01.org/fast-ui-draw/blogs/krogovin/2016/fast-ui-draw-
technical-details-1), a Canvas implementation from Intel that also uses a uber
shader.

------
nhaehnle
Now refactor everything so that the purpose-built shaders are actually
generated from the ubershader simply by hard-coding the various decision
points, possibly using preprocessor tricks? Seems like a natural next step
that should be able to simplify the emulator a lot...

~~~
dom0
In fact, that's the usual meaning of uber-shaders; large shaders parametrized
at compile time.

------
phkahler
In short, they went from shader translation to emulation (on the GPU) which
eliminates the delays of dynamic translation. Fortunately the emulation is
fast enough that it works great.

------
randyrand
does anyone know _what_ the interpreter actually needs to interpret?

what does the code look like?

~~~
to3m
It's a set of register settings. The code in the game looks like a bunch of
function calls; the stuff the emulator has to deal with is a bunch of opcodes
in the FIFO. There's no interpretation per se in the shader - it's not like a
pixel shader or a vertex shader, with some kind of bytecode type of affair.
It's more like the NV_register_combiners GL extension
([http://developer.download.nvidia.com/opengl/specs/GL_NV_regi...](http://developer.download.nvidia.com/opengl/specs/GL_NV_register_combiners.txt)).

~~~
phire
From a software perspective, there is a solid cut of between "not-shaders" and
"shaders". DirectX 7 compatible GPUs don't have "shader" and DirectX 8
compatible GPUs do have "shaders", even if these "shaders" are primitive and
only support a maximum of 8 instructions.

But from a hardware perspective, there is no solid cut off between "non-
shaders" and "shaders".

Because DirectX 8 era Shaders are just slightly more capable register
combiners, with a swanky new shader based programming interface. Oh, and those
8 instructions were shoved into registers, just like on the gamecube.

In some ways, The gamecube's GPU is actually more capable than other DirectX 8
era GPUs (it can support upto 16 instructions), so It's my position that if
you consider DirectX 8 gpus to have proper pixel shaders, then you should
consider the gamecube has having pixel shdaers too, just with an older clunky
programming interface.

The various parts of the instruction (aka register combiner stage) might be
split over various registers, but the gamecube GPU executes them as whole
instructions in series, just like a shader core. Our Ubershader acts like a
proper interpreter, interpreting the raw data out of these registers cycle by
cycle as machine code, just like the real hardware.

------
wellsjohnston
Metroid Prime was so good.

------
throwaway0to1
All this work and you can buy a used GameCube system for $50 USD. What's the
point?

~~~
Narishma
You won't be able to buy a working used GameCube system for $50 forever.

~~~
imtringued
And the game disks won't last forever either.

~~~
NickPollard
and Dolphin can upscale to e.g. 1080p rather than the gamecube's native 480p.

------
libeclipse
> Blog tags
    
    
      3d  4.0  5.0  60FPS  Accessory  adreno  amd Analysis  android  announcement  arm  audio bestof  bug  bugfix  Cheats  Commemoration D3D  D3D9  Datel  driver  Factor5 Feature Removal  Foundation  Gamehacks  gpu Graphics  Hardware  HD  hle  intel  Legal Licensing  mali  mesa  Netplay  new feature nvidia  OGL  Patches  performance progress report  Qt  qualcomm  release releasecandidate  Review  shieldtv  stereo stereoscopy  technical  ubershaders  ui Unlicensed  video  vulkan  Wii  wiimote  Wiimote Wind Waker
    

Err..

------
peterburkimsher
This is incredible work. Predicting, sharing, asynchronous compilation, and
reverse-engineering the pipeline are all very creative solutions to a really
difficult problem. As I understand, deep learning basically runs graphics
cards backwards to generate text from images.

How can we apply these excellent algorithms to machine learning?

~~~
21
Machine learning doesn't run the graphics card backwards.

What they did is not really useful for ML. As said by them, their ubershader
is massively inefficient.

