If I just saw "specialized shaders replaced with ubershaders" on a feature update, I probably wouldn't think there was much of a story to it.
They've recently started a side blog (http://emucross.com/), as well.
If you're curious, their reddit submission history (https://www.reddit.com/user/JMC4789/submitted/) has a bunch of interesting stuff.
Apparently their infrastructure costs are covered by the few ads on the site and they're happy to leave it at that.
Modern Shader cores are basically Turing Complete, so you should be able to emulate anything, including the Wii U's modern GPU. But it won't run fast enough to help, because you will have to spill some (or even a lot) of the shader's state to the host GPU's main memory.
Ubershaders work well for dolphin because the entire mutable state of the GameCube's pixelshaders fit into the available registers, with plenty of space left over.
We took a similar approach when building de Blob 2 for Wii, X360, and PS3. We defined all our materials in terms of TEV stages. On the Wii that was used to set up the TEV when rendering. For X360 and PS3 we had an ubershader that emulated the TEV stages. This made it much easier for the artists; they built all materials in one tool in terms of what are essentially register combiners. We also allowed them to create more complex materials for X360/PS3 that would override the base material and do things that the Wii didn't actually support.
Ugh, it pains me to imagine users that would be anything but appreciative towards these developers, but kudos to the devs for using that abuse as inspiration.
There it is again :(
People who do emulation are, quite simply, the very, very best of us.
My other take away is: just don't bother getting an Nvidia card if you can avoid it.
If anyone is interested in checkout out the massive ubershdaers, I've stashed a copy here:
Also - does this generalise to other rendering pipelines for other devices do you think?
The main difference with ubershaders is skip the shader generation/compilation and directly interpret the raw shader binary .
> Also - does this generalise to other rendering pipelines for other devices do you think?
Though, while it might be possible to run modern shaders in this manor, it won't run fast enough, because you will have to spill some (or even a lot) of the shader's state to the host GPU's main memory.
Ubershaders work well for dolphin because the entire mutable state of the GameCube's pixelshaders fit into the available registers, with plenty of space left over. I assume it will work well for other DirectX 8 or even DirectX 9 era GPUs.
Making it correct is much more difficult than making it at all. I'd be shocked if their emulator got the basics correct in every detail. Example: Texture wrapping occurs by taking the UV texcood modulo the texture size. But since a texcoord can be negative, your modulo operation has to account for that at the boundary between positive and negative, or else you end up with truncated texels.
It doesn't sound like they were really writing a rasterizer, but it's the same concept.
Is that the fmod(fmod(a, b) + b, b) thing, or something else? I still wish that was the default behaviour in more languages…
http://chrishecker.com/images/9/97/Gdmtex2.pdf page 21 has a correct FloorDivMod function.
The final approach of interpreting the shaders initially, while compiling them in the background for greater performance, sounds very similar to what just-in-time compilers do.
If you think about it, the problems they face are also kind of similar: both systems are confronted with unpredictable, dynamic "source code", and both want to achieve both high performance while avoiding the lag introduced by ahead-of-time compilation, so it makes sense that a similar solution approach might work.
Today's consoles seem like repacked PCs with few changes but the older ones seem like actual dedicated gaming hardware, especially PS2 with Emotion Engine and PS1 as disk controller, what the hell (in a good way)?!
What I don't understand is why this was the case. I wonder if a general-purpose-PC-like architecture that was powerful enough to play games of the intended caliber was simply too expensive at the time.
Some games on PS2 like Fatal Frame or Haunting Grounds are impressive even by today standards and could pass for double A games nowdays (entire 17 years later). That's just impressive. And their hardware specs read like a real spec for a gaming machine (the EE's 2 VPUs, the PSX in PS2 for compatibility, etc.), not just "bunch of PC CPUs and GPUs from AMD in a box + blueray drive". Ironically, first original XBox prototype was actually (I think or maybe its a rumor) made out of laptop components.
NES was a bit weak in comparison but very cheap (Pegasus costed like 50 PLN in the early 2000s).
That wasn't because the designs were more clever, mind you; but just because the hardware designers didn't need to think in terms of an architecture that contained concepts like dynamic frequency scaling and multi-monitor support and a kernel that blocks on disk IO. Consoles were hard real-time embedded systems, and the games were their unikernels; well into the PS2 era, console game were still relying on VBlank interrupts for physics timing!
And what this got you, was effects that were only achievable on an $8000 SGI workstation, for $300. Slightly-beyond-state-of-the-art, for cheap. But in exchange, it forced heavy consolidation in the console manufacturer market, because developing that specialized hardware wasn't cheap (like it was back in the 8-bit micro era.)
But "generic" PC GPUs eventually started scaling in power geometrically, to the point where the specialization and hard real-time guarantees just weren't needed any more to achieve modern graphics cheaply. The low-level-targeted specialized-graphics-ASIC technique wouldn't be of much benefit today, because six months later there'd be another new "generic" GPU out that could do what that ASIC does without breaking a sweat.
The same thing happened in networking: ASIC switches with special RTOSes were needed to run data centers—until CPUs and PCIe advanced enough to take over. Now everything (except Internet-backbone gear) is Software-Defined Networking, i.e. generic boxen running VMWare EXS running a Linux/BSD router VM.
> Users refer to this as "sharing shaders" and in theory if users shared UID files, they could compile shaders ahead of time and not encounter stuttering.
They list a few reasons why they didn't go for this, but the tl;dr is that it is hard to build a complete list, so many users in many games would still see stuttering.
As far as I can tell it worked pretty well.
I mean those bits of animation between levels. I understood that many animations are scripted, vs movie recordings.
Apologies if I'm using the wrong words. I don't play many games.
 https://wingolog.org/archives/2011/07/05/v8-a-tale-of-two-co... (Wow, 2011, time goes by fast)
what does the code look like?
Found some flaky documentation: http://amnoid.de/gc/tev.html
edit: phire posted an example ubershader synth https://news.ycombinator.com/item?id=14886856
But from a hardware perspective, there is no solid cut off between "non-shaders" and "shaders".
Because DirectX 8 era Shaders are just slightly more capable register combiners, with a swanky new shader based programming interface. Oh, and those 8 instructions were shoved into registers, just like on the gamecube.
In some ways, The gamecube's GPU is actually more capable than other DirectX 8 era GPUs (it can support upto 16 instructions), so It's my position that if you consider DirectX 8 gpus to have proper pixel shaders, then you should consider the gamecube has having pixel shdaers too, just with an older clunky programming interface.
The various parts of the instruction (aka register combiner stage) might be split over various registers, but the gamecube GPU executes them as whole instructions in series, just like a shader core. Our Ubershader acts like a proper interpreter, interpreting the raw data out of these registers cycle by cycle as machine code, just like the real hardware.
3d 4.0 5.0 60FPS Accessory adreno amd Analysis android announcement arm audio bestof bug bugfix Cheats Commemoration D3D D3D9 Datel driver Factor5 Feature Removal Foundation Gamehacks gpu Graphics Hardware HD hle intel Legal Licensing mali mesa Netplay new feature nvidia OGL Patches performance progress report Qt qualcomm release releasecandidate Review shieldtv stereo stereoscopy technical ubershaders ui Unlicensed video vulkan Wii wiimote Wiimote Wind Waker
How can we apply these excellent algorithms to machine learning?
What they did is not really useful for ML. As said by them, their ubershader is massively inefficient.
I'm genuinely curious as to what articles/papers you've read that made you think deep learning is basically "running a graphics card backwards".