Playstation 2 GS emulation – the final frontier of Vulkan compute emulation

nightowl_games · 2024-09-01T04:47:43 1725166063

How far do I gotta read before this article will expand the "GS" acronym? Interesting stuff but I'm being left behind here.

wk_end · 2024-09-01T04:49:43 1725166183

It stands for “Graphics Synthesizer” - Sony’s name for the “GPU” in the PS2.

Abishek_Muthian · 2024-09-01T07:09:49 1725174589

I realized from recent Anand Tech sunset article[1] that 'GPU' was coined by Nvidia way later than I expected -

>A lot of things have changed in the last quarter-century – in 1997 NVIDIA had yet to even coin the term “GPU”

[1] https://www.anandtech.com/show/21542/end-of-the-road-an-anan...

BlackLotus89 · 2024-09-01T08:19:13 1725178753

How about the "Sony GPU" (1994) used in the PlayStation?

Edit: source https://www.computer.org/publications/tech-news/chasing-pixe...

rzzzt · 2024-09-01T10:46:30 1725187590

The article said it resolved to "Geometry Processing Unit" at that time.

BlackLotus89 · 2024-09-01T20:28:05 1725222485

And RAID once stood for array of inexpensive discs and is now independent discs. But you are right different meaning.

Found these here https://books.google.de/books?id=Jzo-qeUtauoC&pg=PT7&dq=%22g... Computerworld magazine 1976 VGI called a graphics processing unit (GPU)

> 3400 is a direct-writing system capable of displaying 3-D graphics and alphanumerics with speeds up to 20.000....

It's not what I would call a GPU, but I think it's hard to draw lines when it comes to naming things and defining things.

If anyone else wants to try to find the real GPU

https://www.google.com/search?q=%22gpu%22+graphics+processin...

rzzzt · 2024-09-01T20:55:42 1725224142

I misunderstood that bit from the Jon Peddie article, the PS1's GPU was indeed a Graphics Processing Unit. Found this hardware overview in one of the Wikipedia sources to confirm: https://archive.org/details/nextgen-issue-006/page/n54/mode/...

dagmx · 2024-09-01T08:49:04 1725180544

I believe the distinction from NVIDIA was that they considered their product as the first all in one graphics unit

> a single-chip processor with integrated transform, lighting, triangle setup/clipping, and rendering engines that is capable of processing a minimum of 10 million polygons per second

It’s kind of arbitrary, even when you take out the processing rate. But prior to that there was still a significant amount of work expected to be done on the CPU before feeding the GPU.

That said, the term GPU did definitely exist before NVIDIA, though not meaning the same thing we use it for today.

pjmlp · 2024-09-01T13:02:09 1725195729

TI chips for arcades are considered one of the first.

"The TMS34010, developed by Texas Instruments and released in 1986, was the first programmable graphics processor integrated circuit. While specialized graphics hardware existed earlier, such as blitters, the TMS34010 chip is a microprocessor which includes graphics-oriented instructions, making it a combination of a CPU and what would later be called a GPU."

https://en.m.wikipedia.org/wiki/TMS34010

And they weren't alone in the history of graphics hardware.

dagmx · 2024-09-01T14:52:53 1725202373

Yep, and I think perhaps that’s where folks are getting hung up.

NVIDIA didn’t invent the GPU. They coined the modern term “graphics processing unit”. Prior to that, various hardware existed but went by other expanded names or don’t fully match NVIDIAs arbitrary definition, which is what we use today.

rzzzt · 2024-09-01T18:43:29 1725216209

IBM's PGA card had an additional 8088 dedicated to graphics primitives: https://en.wikipedia.org/wiki/Professional_Graphics_Controll...

amy-petrik-214 · 2024-09-03T12:27:01 1725366421

basically with all those bells and whistles (transform, lighting, clipping, shaders, etc, etc)

the "old way" was to engineer a bit of silicon for each one of those things, custom-like. problem was how much silicon to give to teach feature, it almost has to be fine-tuned to each individual game, a problem

So nvidia comes up with the idea to sort of have a pool of generic compute units, each of which can do T&L, or shading, etc. Now the problem of fine-tuning to a game is solved. but also now you have a mini compute array that can do math fast, a general-purpose unit of processing (GPU-OP), which was a nod from NVIDIA to the gaming community (OP - overpowered)

TapamN · 2024-09-01T10:31:42 1725186702

I think I remember seeing the term GPU used in a Byte article from the 80s? It was a while ago when I saw it (~15 years), so I can't really remember any details.

msla · 2024-09-01T11:25:33 1725189933

Here's GPU (Graphics Processing Unit) in Byte Magazine from February 1985.

https://archive.org/details/byte-magazine-1985-02/1985_02_BY...

TapamN · 2024-09-01T13:11:05 1725196265

That seems to be what I remembered. I did try a few searches; But I didn't find it in the first few results of "byte magazine gpu" on Google, and my search on IA wasn't completing. I didn't feel like spending more time than that...

rbanffy · 2024-09-01T14:23:59 1725200639

It’s always BYTE magazine. I think “backslash” is also on them.

rzzzt · 2024-09-01T10:44:54 1725187494

There is a SIGGRAPH Pioneers panel where participants talk about the history of the GPU (you can find the full talk on YouTube, this is a summary article): https://www.jonpeddie.com/news/gpus-how-did-we-get-here/

  > Two years later, Nvidia introduced the GPU. He [Curtis Priem] recalls that
  > Dan Vivoli, Nvidia's marketing person, came up with the term GPU, for 
  > graphics processing. "I thought that was very arrogant of him because how 
  > dare this little company take on Intel, which had the CPU," he said.

msla · 2024-09-01T11:25:24 1725189924

Here's GPU (Graphics Processing Unit) in Byte Magazine from February 1985.

https://archive.org/details/byte-magazine-1985-02/1985_02_BY...

nightowl_games · 2024-09-01T04:51:59 1725166319

That helps immensely. Surprised the author assumed this was implicit.

chaboud · 2024-09-01T05:22:10 1725168130

Among the intended audience, it likely is. I think this article is here to equally amuse and traumatize folks familiar with the system.

The Emotion Engine (CPU) to GS (GPU) link was what made the PS2 so impressive for the time, but it also made it somewhat hard to code for and immensely hard to emulate. If I recall correctly, the N64 has something like 4x the memory bandwidth (shared) of the PS1, and the PS2 had roughly 6x (3GB/s) the system bandwidth of the N64. However, the PS2's GS RAM clocked in at 48GB/s, more than the external memory bandwidth of the Cell (~25GB/s), which meant that PS3 emulation of PS2 games was actually done with embedded PS2 hardware.

It was a bonkers machine. I don't think workstation GPU bandwidth created 50GB/s for another 5-6 years. That said, it was an ultra simple pipeline with 4MB of RAM and insane DMA requirements, which actually got crazier with the Cell in the PS3. I was at Sony (in another division) in that era. It was a wild time for hardware tinkering and low level software.

deaddodo · 2024-09-01T09:37:11 1725183431

> However, the PS2's GS RAM clocked in at 48GB/s, more than the external memory bandwidth of the Cell (~25GB/s), which meant that PS3 emulation of PS2 games was actually done with embedded PS2 hardware.

That's kinda overselling it, honestly. When you're talking about the GIF, only the VU1's vertex pipeline was able to achieve this speed directly. PATH2/PATH3 used the commodity RDRAM's bus (unless you utilized MFIFO to mirror a small portion of that to the buffer, which was much more difficult and underutilized than otherwise since it was likely to stall the other pipelines); the exact same bus Pentium 4's would use a few months after the PS2's initial launch (3.2-6.4GB/s). It's more akin to a (very large) 4M chip cache, than proper RAM/VRAM.

As to the PS3 being half that, that's more a design decision of the PS3. They built the machine around a universal bus (XDR) versus using bespoke interconnects. If you look at the Xbox 360, they designed a chip hierarchy similar to the PS2 architecture; with their 10MB EDRAM (at 64GB/s) for GPU specific operations.

As to those speeds being unique. That bandwidth was made possible via eDRAM (on-chip memory). Other bespoke designs utilized eDRAM, and the POWER4 (released around the same time) had per-chip 1.5M L2 cache running at over double that bandwidth (100GB/s). It also was able to communicate chip-to-chip (up to 4x4 SMP) at 40GB/s and communicate with it's L3 at 44GB/s (both, off-chip buses). So other hardware was definitely achieving similar to and greater bandwidths, it just wasn't happening on home PCs.

chaboud · 2024-09-01T18:14:51 1725214491

I think that’s fair. It was, in effect, a cache and DMA target. A similar scheme was present in the Cell, where the SPE’s had 256KB of embedded SRAM that really needed to be addressed via DMA to not drag performance to the ground. For low-level optimization junkies, it was an absolute playground of traps and gotchas.

Edit: if memory serves, SPE DMA list bandwidth was just north of 200GB/s. Good times.

twoodfin · 2024-09-01T16:26:41 1725208001

Growing up with the tech press in this era, “commodity RDRAM” is a funny phrase to read!

As I recall, the partnership between Intel and Rambus was pilloried as an attempt to re-proprieterize the PC RAM interface in a similar vein to IBM’s microchannel bus.

rowanG077 · 2024-09-03T00:12:08 1725322328

Why is it funny. You could just go to a computer store and buy RDRAM.

twoodfin · 2024-09-03T19:44:05 1725392645

It wasn’t a commodity! It was a single-owner proprietary technology. You needed a license from Rambus to manufacture it. And at least at the time it commanded a premium price over true commodity DRAM.

People on Slashdot got really worked up when Intel signed a deal with Rambus to make RDRAM the only supported technology on Intel x86—from which they relatively quickly backtracked.

Anandtech (sniff) of course has great contemporary background:

https://www.anandtech.com/show/545

deaddodo · 2024-09-01T18:50:56 1725216656

"Commodity" meaning "something that could be bought at the store".

twoodfin · 2024-09-01T23:49:10 1725234550

From the Rambus store!

deaddodo · 2024-09-02T05:50:43 1725256243

forgotpwd16 · 2024-09-01T08:36:21 1725179781

PS2 compatibility on PS3 went through 3 phases. Launch models had EE+GS+RDRAM (full PS2 hardware), just after had only GS, and finally emulation was done entirely in software. Using a CFW can utilize the latest method to play most (ISO backed-up) PS2 games.

Cloudef · 2024-09-01T13:16:31 1725196591

Unfortunately the PS2 emulation on PS3 left a lot to be desired

jonny_eh · 2024-09-01T18:53:54 1725216834

> Using a CFW can utilize the latest method to play most (ISO backed-up) PS2 games

Can it not use the first method if the hardware is available?

forgotpwd16 · 2024-09-01T19:58:00 1725220680

Sure and actually if available that is the default when mounting ISO. Just saying essentially every PS3 can emulate PS2 games. Not only those with (full or partial) PS2 hardware. Worth mentioning PSN classics (or converted games) utilize software emulation in every PS3.

to11mtm · 2024-09-01T20:56:14 1725224174

Huh, TIL!

(I never had a PS3, AFAIR my PSP just downclocked it's CPU, I remember battery life playing classics was awesome...)

djtango · 2024-09-01T06:38:48 1725172728

Thanks for sharing!

Back then the appeal of console games to me were that beyond a convenient factor, they were also very specialised hardware for one task - running games.

I remember playing FF12 (IZJS) on a laptop in 2012 and it ran very stable granted that was 6 years post release but by then had the emulator issues been fully solved?

Re. Wild time for low level programming I remember hearing that Crash Bandicoot had to duck down into MIPS to eke out every extra bit of performance in the PS1.

2600 · 2024-09-01T06:54:00 1725173640

I heard something similar with Jak and Daxter, where they took advantage of the embedded PS1 hardware on the PS2 to scrape together some extra performance.

I very much enjoyed this video that Ars Technica did with Andy Gavin on the development of Crash Bandicoot: https://www.youtube.com/watch?v=izxXGuVL21o

midnightclubbed · 2024-09-01T07:25:07 1725175507

Most PS2 games used the PS1 hardware for audio/sfx so you freed the main CPU for rendering/gameplay.

I believe Gran Tourismo 3 used the ps1 hardware for everything except rendering, which was the kind of nuts thing you could do if your games were single platform and with huge budget.

fmbb · 2024-09-01T06:46:00 1725173160

Pretty sure all 3D PS1 games tapped into assembly for performance.

The most “famous” thing about Crash programming is probably that it’s all in lisp, with inline assembly.

wk_end · 2024-09-01T07:54:25 1725177265

I don’t think either of these things are true.

By the mid-90s, C compilers were good enough - and CPUs were amenable enough to C - that assembly was only really beneficial in the most extreme of cases. Sony designed the Playstation from the jump to be a 3D machine programmable in C - they were even initially reluctant to allow any bit-bashing whatsoever, preferring devs used SDKs to facilitate future backwards compatibility. No doubt some - even many - PS1 games dropped down into assembly to squeeze out more perf, but I doubt it’s anything like “all”.

As for Lisp, Crash used a Lisp-like language, “GOOL”, for scripting, but the bulk of the game would’ve been native code. It was with Jak & Daxter that they really started writing the game primarily in a Lisp (GOAL, GOOL’s successor).

boricj · 2024-09-01T08:35:03 1725179703

Definitely not. I'm reverse-engineering Tenchu: Stealth Assassins and the original Japanese release's game code is written entirely in C as far as I can tell.

It does use the Psy-Q SDK which contains assembly, but it's not something that the game developers have written.

dagmx · 2024-09-01T08:43:32 1725180212

Both PS1 and N64 games were beyond the generation where direct assembly was required to achieve necessary performance.

Compilers were good enough by that point in time for the vast majority of games.

mypalmike · 2024-09-01T09:14:40 1725182080

Most games were primarily C. 3D performance came from the graphics hardware, which had straightforward C APIs and dedicated texture RAM. The machine lacked a floating point processor, so I think we wrote some fixed point math routines in assembly, but that's basically it.

pjmlp · 2024-09-01T13:04:16 1725195856

PS1 is famous among game developers for being the first games console for the home market with a C SDK, instead of only Assembly as programming option.

All games tap into Assembly, in some form or the other, even nowadays.

heraldgeezer · 2024-09-01T18:37:35 1725215855

Would programmers now be able to make God of War 2, FF12, Jak 2&3 and Ratchet and Clank 3&Deadlocked? PS2, slim especially, is an incredible machine. Scale the games up to 1080p or 1440/4k and they look great still.

dietrichepp · 2024-09-01T06:56:03 1725173763

It’s hard to compare memory bandwidth of these systems because of the various differences (e.g. different RAM types). I know that the theoretical RAM bandwidth for the N64 is way higher than what you can practically achieve under typical conditions.

chaboud · 2024-09-01T18:10:34 1725214234

I totally agree, except when it comes to emulation. It is immensely hard to real-time emulate if some facet of your host hardware isn’t at least a little bit faster than the corresponding element in the emulated system. Then you have to lift the abstraction and often find that the rest of the system reflects this specificity.

At my last job, we had ASICs that allowed for single-sample audio latency with basic mixing/accumulation functions for pulling channels of audio off of a bus. It would have been tragically expensive to reproduce that in software, and the required hardware to support a pure software version of that would have been ridiculous.

We ended up launching a new platform with a different underlying architecture that made very different choices.

pjmlp · 2024-09-02T05:51:50 1725256310

Most third parties weren't able to explore the PS2 properly, until Sony had to go through a revamp of their SDK, and collaborated with Gran Turism folks to show what was actually possible.

xgkickt · 2024-09-02T15:14:06 1725290046

Not sure what you mean. The entire system was available to everyone day 1 through the manuals. The introduction of the VLC VU assembler made VU development easier (single instruction stream, automatic register allocation). PATH3 texture uploads maybe?

pjmlp · 2024-09-02T18:26:15 1725301575

Just because someone gets a driver license, doesn't mean they are fit to drive on F1 GP.

Likewise most game developers weren't fit to drive PS2 without additional help from Sony.

dagmx · 2024-09-01T06:06:02 1725170762

It’s fairly well known in the specific domain space the author is dealing with, so I think it’s fair to treat it as implicit.

But for more reading https://www.psdevwiki.com/ps2/Graphics_Synthesizer

The author has a bunch of other things in their post they don’t expand upon either which are significantly more esoteric as well though, so I think this is very much geared for a particular audience.

A few link outs would have helped for sure.

badsectoracula · 2024-09-01T10:22:17 1725186137

> Pray you have programmable blending

I prayed for programmable blending via "blending shaders" (and FWIW programmable texture decoding via "texture shaders" - useful for custom texture format/compression, texture synthesis, etc) since i first learned about pixel shaders waay back in early 2000s.

Somehow GPUs got raytracing before programmable blending when the former felt like some summer night dream and the latter just about replacing yet another fixed function block with a programmable one :-P

(still waiting for texture shaders though)

dagmx · 2024-09-01T15:13:14 1725203594

Mobile GPUs with any PowerVR heritage have it

https://medium.com/pocket-gems/programmable-blending-on-ios-...

https://developer.apple.com/videos/play/tech-talks/605

pandaman · 2024-09-01T11:55:24 1725191724

The mobile PowerVR GPUs had programmable blending last time I've touched those, in fact, it's the only kind of blending they had. Changing blend states on PS Vita was ~1ms, not pretty.

mandarax8 · 2024-09-02T08:57:31 1725267451

Isn't 'VK_EXT_fragment_shader_interlock' kind of like programmable blending? Or Raster-order-views in DX land.

This is a great example of using it: https://vulkan.org/user/pages/09.events/vulkanised-2024/vulk...

pjmlp · 2024-09-01T12:59:23 1725195563

Well, there is mesh shaders, work graphs, CUDA, plain C++ shaders.

OTOY does all their rendering with compute nowadays.

Cieric · 2024-09-01T19:29:18 1725218958

I mean most of that could be emulated in vulkan or dx12, I'm not sure about other apis. I'm curious what the use case would be though. I'm also fairly certain it can be done to a degree, but without a convincing use case it's hard to justify the implementation work.

ammar-DLL · 2024-09-01T07:13:04 1725174784

i was really Hope someone rewrite dynarmic (https://github.com/yuzu-mirror/dynarmic) like that and write blog about it

tetris11 · 2024-09-01T12:03:02 1725192182

For anyone wondering:

> A dynamic recompiler is a type of software that translates code from one instruction set architecture (ISA) to another at runtime, rather than ahead of time. This process allows programs written for one platform to be executed on another platform without needing to modify the original code. Dynamic recompilation is often used in emulators, virtual machines, and just-in-time (JIT) compilation systems.

Is this what Dolphin does most of the time, or is all handcrafted at the assembly level?

troad · 2024-09-02T04:33:48 1725251628

My (limited) understanding is that most serious emulation is done via dynarec these days, particularly past the fourth / fifth gen of consoles.

For newer consoles, traditional interpreted emulation isn't going to be fast enough to be playable, so dynarec is the only realistic option.[0] For the older systems, it's more of a nice-to-have performance boost, particularly as a lot of people now like to play those with demanding upscalers and shaders. (Some older games, like Sonic 2, were designed around composite video artefacts and don't look right without CRT shaders.)

[0] Things get fuzzier on near-current gen consoles, which are more like PCs, and may hypothetically be best emulated with some kind of translation layer not unlike Proton.

ammar-DLL · 2024-09-01T14:48:18 1725202098

(Citra, Panda3DS, Vita3K, touchHLE, and unidbg and more ) use dynamic for ARM CPU Emulation but Dolphin/cemu use same concept to emulate PowerPC Chips too

PS4 emulators use different approach tho

xgkickt · 2024-09-02T16:13:41 1725293621

My favorite part of the GS was the sheer insanity of the bus, 2560 bits wide in total with a clever split on the cache. The PS3 felt like a step down in some ways, blending especially.

bonzini · 2024-09-01T09:06:59 1725181619

How does this approach compare to Dolphin's ubershader?

Sesse__ · 2024-09-01T11:06:34 1725188794

Basically, not at all. Dolphin's ubershader does one thing; simulate fixed-function blending/texturing using modern flexible hardware. (It was already an old technique when Dolphin adopted it, by the way.) This project is a complete renderer, with the rasterizer and all, and as you can see from the text, it includes an ubershader for the blending. The shader doesn't draw triangles, it just gets called for each point in the triangle and with some inputs and gets to decide what color it is.

It's vaguely like comparing a full CPU emulator with something that implements the ADD and MUL instructions.

armada651 · 2024-09-01T18:52:55 1725216775

> Dolphin's ubershader does one thing; simulate fixed-function blending/texturing using modern flexible hardware.

Just to clarify, Dolphin's specialized shaders simulate fixed-function blending/texturing too. What's different about ubershaders is that a single shader can handle a wide variety of fixed-function states whereas a specialized shader can only handle a single state.

Thus whereas specialized shaders have to be generated and compiled on-the-fly resulting in stutter; ubershaders can all be pre-compiled before running the game. Add to this the capability to asynchronously compile specialized shaders to replace ubershaders and the performance loss of ubershaders becomes negligible. A rare case of having your cake and eating it too.

Sesse__ · 2024-09-01T21:05:24 1725224724

This is very true. And to make matters even more confusing, certain drivers were at some point capable of constant-folding shaders on-the-fly so that ubershaders are essentially moved back into a bunch of specialized shaders, negating the entire concept. I don't believe this actually affects Dolphin, though (I think the drivers in question have given up this optimization).

MaxBarraclough · 2024-09-01T09:09:18 1725181758

What's meant by top-left raster?

sibane · 2024-09-01T09:36:08 1725183368

The top-left rule used in triangle rasterisation (https://en.wikipedia.org/wiki/Rasterisation#Triangle_rasteri...).