The simplest trick for detecting old ARM emulation - ISTR was used on some Gameboy Advance copy protection: store a booby-trap instruction at PC+4 (i.e. the very next one). A real ARM has a pipeline that reads PC+8, while decoding PC+4 and while executing at PC. So the newly-stored instruction should have no effect. An emulator (which didn't emulate the hardware pipeline) would execute it.
The Texas Instrument TI320C40 digital signal processor had even weirder pipeline issues:
- Branch delay slots (https://en.wikipedia.org/wiki/Delay_slot), where one or more instruction(s) after a branch would be executed before the branch actually occurred.
- Load delay slots, where values stored into registers weren't guaranteed to appear until some later instruction. I believe the the value in the register was undefined for several cycles?
Writing tightly-optimized assembly code for these chips was pretty horrible, sort of like playing an unusually tasteless Zachtronics clone.
It was also kinda awesome because, as long as you were willing to spend days to optimize one page of code, you could get so much performance out of it.
Things like deliberately using the fact that multiplies only write the results into a register ~6 cycles later, means you can use that register for a bunch of other stuff in the meantime, and then on the 6th cycle the results would magically appear.
Basically, for those 6 cycles, you had no registers in-use for either the source operands or destination of the multiplication.
Obviously this is also pipelinable - you can start more multiplies while the first is running, using the same source and destination registers, but meanwhile you've used other instructions to load more data into the inputs and do something else with the outputs.
Another sibling comment here references it obliquely but on x86 the prefetch queue produces similar behaviour, until Intel decided to detect SMC on the Pentium and newer CPUs so that modifying the instruction about to be executed will always have an effect.
However, someone much later found another undetected edge-case: a self-overwriting repeated string instruction.
Some pipelined CPUs have retained compatibility with self-modifying code and detect when you overwrite an instruction that is on the pipeline and flush it.
X86 has that machinery although I'm not sure if they dropped it eventually on the 64-bit variant.
Mednafen it's fine too. Altough VBA... there's VBA-M which is much better than the original one, where the audio code for it was very bad and it glitched under GNU/Linux and BSD a lot.
That is why emulating, when targeting 100% accuracy, is a craftsmanship in our industry. Not only do you need to know each and every quirk the original hardware/software has, but you also need to replicate it, however peculiar it is. Consider the potential performance impact if itself is not challenging enough.
Emulators have to be pragmatic about accuracy, when emulating more modern systems it's generally not feasible to target 100% hardware accuracy and usable performance, so they tend to accept compromises which are technically deviations from the real hardware but usually don't make any observable difference in practice. Anything that uses a JIT recompiler is never going to be perfectly cycle-accurate to the original hardware but it usually doesn't matter unless the game code is deliberately constructed to break emulators.
Dolphin had to reckon with that balance when a few commercial Wii games included such anti-emulator code, which abused details of the real Wii CPUs cache behavior. Technically they could have emulated the real CPU cache to make those games work seamlessly, but the performance overhead (likely a 10x slowdown) would make them unplayable, so they hacked around it instead.
I once wrote something that would hard lock cortex-A8 but not the cortex-A9 we shipped on. To my knowledge, nobody tracked down why our app, once exfiltrated from our device, would crash slightly older phones.
A8 erratum. This was ages ago, but if I recall you could place a thumb2 instruction straddling two pages, only one of which was loaded in the TLBs. If you got everything right, the A8 would hang without trapping.
Edit: it was errata #657417, long since scrubbed from arm.com
The A8 errata doc is at https://developer.arm.com/documentation/prdc008070/latest/ these days and does have a description of 657417 with enough detail to make writing a reproducer possible. Instructions crossing page boundaries are tricky beasts :-)
You assume an anti-piracy attempt when GP, from my reading, made no such statement. More of a mystery, but who cares because the problem hardware wasn’t what they shipped on.
If it hardlocked an A8 but not an A9, chances are very high that an emulator would run it with no problem, because nobody deliberately tries to emulate the kind of CPU bug that lets an app hardlock the CPU. GP appears to have been interested in deterring people from running their code on non-authorised real hardware at the time, not targeting emulator users.
Bingo! Didn't want someone running new product's app on old product's hardware. Company was new to building non-RTOS devices which were tightly hardware bound, wanted similar type restrictions.
That's true, the small differences between a pragmatic "accurate enough" emulator and real hardware can matter for speedrunners. The difference between real hardware running at 60fps and a principled cycle-accurate emulator running at <0.1fps would matter more, though.
For the SNES and earlier it's feasible to have exceptional accuracy and still usable performance, but for anything modern it's just not happening. Imagine trying to write a cycle-accurate emulator core for a modern CPU with instruction re-ordering, branch prediction, prefetching, asynchronous memory, etc, nevermind making it go fast.
I think the cutline can be moved to the original PlayStation now.
>but for anything modern it's just not happening.
Which arguably explains a cultural rift in arcade emulation circles. MAME's philosophy is about cycle-accuracy, which might work for bespoke arcade hardware up to early 3D systems, whether they're bespoke (such as Namco's System 22) or console-derived (Namco's System 1x series, which all derive from the original PlayStation hardware) hardware. For newer arcade titles, which are just beefed period PCs, such kind of emulation (philosophy) would not be suffice for gameplay.
The good news is that modern systems are so unpredictable relative to each other that games can be relied upon to not require cycle-accuracy. IIRC the cycle timings can differ between different units of the same model.
I wonder how mainframe emulators (that sometimes are used to run legacy, very critical software on modern hardware) manage to do it. Do they go for full complete emulation? As in, implementing the entire hardware in software?
Mainframes typically execute batch processes on a CPU. Much simpler than a game console with a GPU. Cycle-accurate emulation is less relevant for mainframes.
You really don't have to understand any deep wizardry to get started (or, for the most part, even to finish). For the most part, you just look at a specification, and implement what it says. It requires some code architecting skill to not make a mess, but there are common patterns and it becomes a lot easier once you've built one or two emulators.
And you almost never need to understand electronics. You're only emulating behavior. When someone discovered bugs in the behavior of the original hardware, you usually just need to special-case it in your emulator. It might help to know some electronics to understand how those behaviors came to be, but that's more so of historical interest than actually practical.
There are certain unique challenges, but it's nothing too difficult. When there's an issue, you're usually debugging three things at once: Your understanding of the hardware, your implementation of the emulator, and the game you're emulating. It can be hard to pin down the exact problem. But here, I encourage you to just hack something together. It's not clean, but all emulators are full of special cases that try to somehow get popular games working. If a couple unclean hacks mean you can get a game working, just do so. You don't really need to exactly implement the original hardware's behavior. Just get the game working.
You start by reading hardware documentation and you don't need to understand the machine at an electronic level. It's not a simulation of digital circuits. It doesn't need to be that complicated. An 8-bit CPU is a simple state machine with only a few bytes of state (registers). You can read the program one byte at a time and simulate whatever operation the CPU would do after reading that byte. They are very simple operations like adding and subtracting numbers, loading and storing bytes.
Your 6502 CPU emulator would read the next few bytes of the program, interpret the bytes as an instruction, execute the instruction, which involves updating a couple of registers/counters in the CPU, doing some arithmetic or bitwise operations, and possibly loading or storing a byte of data from one location to another. This process is then repeated in an endless loop.
It is a simulation of the fetch-decode-execute cycle.
If you have good documentation then you're just following the specs and turning that into code. Next level difficulty is reverse engineering the hardware, figuring out how it works when there isn't any documentation. These guys did that for NES back in the 90s and early 2000s, writing little test ROMs to see what the hardware does, in addition to whatever official or unofficial developer docs they could find.
For instance, I ported a 6502 interpreter from UNIX to Classic Macintosh back in the day. This was to play SID music files. So long as it ran fast enough, clock cycle accuracy wasn’t important.
I tried to get into it many moons ago, and the guidelines were to start from something very simple and well documented, and build your skills from there. I still have the itch but lack the time :(
FPGA implementations are often implemented based on code or documentation from software emulation projects. An FPGA version of a PS2 has no guarantee of not implementing the same or similar bug.
The point here is that it is actually viable to reimplement such bugs without incurring significant performance penalties.
A software emulator has to be able to execute a single PS2 instruction in the same amount of wall time as it'd take on the original hardware. With a regular multiplication that's fairly easy: x86 also has multiplication, so you can do a 1:1 translation and be fairly certain it's within your time budget. With a bugged multiplication you need to do a regular x86 multiplication, and wrap that in a few dozen other instructions to add the buggy behaviour to it. There's a pretty decent chance it's simply too expensive!
When you're writing an FPGA emulator you are able to recreate the buggy multiplication directly in hardware. There's no additional wrapping needed, so (beyond figuring out intended behaviour) it's not any more costly than emulating a non-buggy multiplication. It's far easier to do a cycle-accurate emulation because you have direct control over the transistors!
Higan is a cycle-perfect SNES emulator, and it's very single-core CPU-intensive. This is what the FAQ [0] says:
Full-speed emulation for the Super Famicom base unit requires an Intel Core 2 Duo (or AMD equivalent), full-speed for games with the SuperFX chip requires an Intel Ivy Bridge (or equivalent), full-speed for the wireframe animations in Mega Man X2 requires an even faster computer. Low-power CPUs like ARM chips, or Intel Atom and Celeron CPUS generally aren’t fast enough to emulate the Super Famicom with higan, although other emulated consoles may work.
Work can't be split across cores (according to the FAQ) because that would compromise the accuracy of the timing.
It may be that the PS2 has similar problems while being more powerful than the SNES.
Idk about PS2, but N64 (a generation older, much slower) still doesn't have a cycle perfect emulator that runs in real time.
Remember that you can't just perfectly emulate the CPU, you must also perfectly emulate the GPU, since they share the memory bus so one can slow the other.
You must be young. A PS2 can run a modernish Linux with WindowMaker and decode MP3's, maybe FLAC and Opus and some MP3+XVid/DivX videos. Not so far from a cycle accurate emulation of a low end Windows 98 era PC. A PS2 with Linux can comment on this page for instance.
With TLS et all, if you use Dillo as the web browser as it uses MbedTLS. The TLS handshake might ast a few seconds but that's it.
Try running that on a cycle accurate emulator without bringing an i7 to its knees.
A similar PC would be a Pentium II-III at 450-500 MHZ with 64MB of RAM running Damn Small Linux or NetBSD with a small Window Maker setup. Bear in mind that you could run a non-accurate SNES emulation in that machine, with sound outputted at 8000 HZ and with no filters, enough for most common games such as Chrono Trigger and Super Mario World with lots of hacks fixing the imprecise timing under ZSNES.
You would probably need a terahertz CPU for cycle-accurate PS2 emulation.
Cycle-accurate PS2 emulation means emulating the state of the CPU, GPU, other interacting processors, and their various interconnecting busses at clock cycle granularity and possibly at sub-cycle granularity if the processors are running asynchronously.
The IOP (PS1 processor used for I/O), SPU2 (sound processor), IPU (mpeg decoding), EE (main CPU), VU0, VU1, DMA controller and finally the GS (GPU) run asynchronously. It's naive for others to think perfect cycle accuracy for this machine might be possible with any hardware that exists today. PCSX2 is lucky to have VU1 on it's own thread. It still doesn't work perfectly IIRC, VU0 -> VU1 access is a little sketchy with it enabled.
Fixing this bug would be part of fixing a bunch of other floating point bugs, more specifically rounding and clamping.
Yes, software floating point would be slower, but the general solution would probably follow the PS4s PS2 emulator. Where each game can have whitelisted sections of code for the software floating point path.
Why would you want to emulate an old crappy MIPS CPU using a relatively expensive FPGA? The whole idea of emulating old consoles is to be independent of the hardware so you can play your old games on your computer or phone.
I second that. I wrote a nes emulator twenty years ago because it was fun, and not for any practical purpose. I had no idea what I was doing, but I remember being in awe of the nes after reading the detailed hw spec ( found on zophars domain, doc by yoshi if memory serves me well ). I promptly decided to write an emulator in whatever language I was learning at that time.
The result was terrible, but I had tremendous fun!
Because the implementation on your computer or phone behaves slightly differently from actual hardware, sometimes to the point of being unplayable. If you can't get your hands on a genuine working console, cycle-accurate FPGA implementation is the next best thing.
Terrence Howard (actor in Ironman, Empire TV show, etc) believes he has discovered "a new math" where 1x1=2. I believe he has gained recent notoriety because he was on the Joe Rogan podcast, where he got a platform to say his beliefs to many people, but he has had these beliefs for many years.
As far as I can tell, his reasoning is literally that 2x2=4, so if you divide both sides by 2, you get 1x1=2.
A PS2 developer may want to do so as a type of copy protection. Somebody today, when implementing a PS2 emulator would want to know about these techniques in order to archive and document these games.
For current systems, sure, a developer might want to make it difficult to run their game in an emulator to deter piracy. There were a handful of Wii games which did that since Dolphin matured while new Wii games were still being released. Nobody is making new commercial PS2 games anymore though, so any anti-emulator trick that wasn't deployed back in the day is never going to matter in practice.
In that era, wasn't emulation so far behind the real hardware that you wouldn't have to worry about emulators until at least 5 years after the console came out?
“Sony had accused Bleem! of engaging in unfair competition by allowing PlayStation BIOSs to be used on a personal computer (…) The Judge had rejected the notion, and issued a protective order to "protect David from Goliath".”
“(As) Bleem! had (…) to deal with defense costs of $1 million per patent”
Fun fact: GBA emulation and homebrew development actually started before the system's release [1]. IIRC, the official devkit was leaked, then someone wrote a summary of the low-level documentation and started circulating that, which (since the CPU had an off-the-shelf ARM core) was enough to start making rudimentary emulators and tech demos.
Anyone else was lost with the title asking why a mouse or a keyboard should do math ? I spent to much time to find this about Playstation 2, not about Personal System/2 ports used to connect mouse and keyboard.
To all the downvoters: Just because something is obvious to you doesn't mean that it is obvious to everyone else. Three character abbreviations can make context discovery really difficult as simply plopping the abbreviation into Google will quite often produce mostly unrelated results.
edit: described in more detail here, among other emulation-busting measures from 2004 https://mgba.io//2014/12/28/classic-nes/