Hacker News new | comments | show | ask | jobs | submit login
Branchless Doom (github.com)
454 points by strangecasts 4 months ago | hide | past | web | favorite | 80 comments

Thank heavens for this advancement! I imagine those using Doom to manage system processes (https://www.cs.unm.edu/~dlchao/flake/doom/chi/chi.html), will appreciate the security measures implemented in this version. I can only hope these two projects can find common purpose.

Several years back I convinced a coworker to try and install that. He had a mac and I was running FreeBSD or something and couldn't run it. He'd been running some sort of simulation for several days on that machine.

So we fired it up and within about 2-3 minutes he managed to shoot PID 1 or something like that as the machine rebooted. Boss was not happy.

mm usually PID 1 is unkillable even by root AFAIK.

Actually you're right. He killed the simulation process now that I'm wracking my brain to remember how that actually worked. That then led to us joking about what it'd be like to force-reboot via the killing.

The key detail I'll never forget is his "Oh fuck!" face when it happened.


Not all signals are blocked by init, if init wants to be signaled :)

Don't know why you got DVd. This does nothing on a VM running Kali Linux (systemd):

# kill -9 1

https://www.freedesktop.org/software/systemd/man/systemd.htm... - See signals

It's init dependent (though I think SIGKILL is blocked at the kernel – I wouldn't swear to it) - sigrtmin+3 will likely kill your init process :)

Cor, the locals are a bit harsh here.

I've just run # kill -L - there seem to me rather more signals available than I remember from the past. I'm pretty sure I have never even heard of the ones >16 let alone SIGRT{MIN|MAX}{|x}. I've only ever used SIGHUP or SIGKILL in general and SIGUSR1 for dd progress (which now has --progress at last)

Time to dig out the manual 8)

In Linux, from the manpage:

"The only signals that can be sent to process ID 1, the init process, are those for which init has explicitly installed signal handlers. This is done to assure the system is not brought down accidentally."

And SIGKILL can not be captured (enforced by the kernel):

"The signals SIGKILL and SIGSTOP cannot be caught or ignored."

wtf, hahaha

Haha that made me LOL

They’ll certainly have enough time to decide which proc to kill!

"The mov-only DOOM renders approximately one frame every 7 hours, so playing this version requires somewhat increased patience."

Favourite line.

You could play Doom over email with that kind of time

Did a back-of-the-envelope calculation. Assuming you responded promptly it would take around 504 years to finish the game.

I'd say even over paper mail.

I look forward to this being a benchmark for testing CPU advancement in the years to come.

"It gets 2 frames per hour in Branchless DOOM!"

Compiler FAQ, in it's entirety, says it all:

>Q: Why did you make this? A: I thought it would be funny.

To the author's credit, it really is. I'm still giggling.

"This is thought to be entirely secure against the Meltdown and Spectre CPU vulnerabilities, which require speculative execution on branch instructions."

I must not be understanding Meltdown correctly and I know this is a joke, but I thought it was the malicious code that used speculative execution to (indirectly) read unauthorized memory. So in what sense is Branchless Doom secure against Meltdown?

You don't store your plaintext passwords in a modded version of Doom with a kernel scheduler hack to ensure it mostly gets dispatched on the HyperThread core neighboring the one your browser is getting dispatched onto?


Meltdown doesn't involve speculative execution across branches. It is out of order execution through instructions that will fail.

My understanding of how some of this works is that to handle jmps, the program uses nojmp to become its own signal handler and inserts a faulty mov to raise a SIGSEGV, mov addresses around, and continue as though it was a successful jmp.

And it looks like float and int math is handled with lookup tables. yikes! but impressive!

It's cooler than that, IMO. It uses the SIGSEGV to jump back to the start, once per global iteration. Bit most of the normal 'jumps' are encoded by doing both sides of the work, but using a scratch base address for the storage of the 'not taken' side that ends up not contributing to the actual result.

So it prevents speculative execution of branches, by speculatively executing them itself? lol

Structuring code like this actually has it's purpose. Instruction stream executed by massively parallel SIMD systems (prototypically for 80's supercomputers like Connection Machine, but also for GPUs few generations back) actually works in exactly same way.

It went full circle! The irony is that CPUs already do a ton of speculative branching

I know it's a joke, but this still branches, via an obscure mechanism of faulting, and mov instructions past the faulting instruction will speculatively execute.

That's not quite true... the code (essentially) loops over itself infinitely; the only branch is a loop back to the beginning of the program. Since this branch has only one target, it can be a direct jump (as opposed to an indirect one), meaning it would not use the branch trace buffer, and would not cause speculative execution. The author used faulting for branches only to write the program in all mov instructions; if the last 'mov' is replaced with a direct jmp (which, it looks like, can be done with the --no-mov-loop flag to the compiler), no more speculative execution.

> direct jump

Another important thing here is that it's an unconditional direct jump. A conditional direct jump can still cause speculative execution (e.g. could be vulnerable to Spectre variant 1).

That's a pity, I was hoping that this could be converted to a SIMD architecture allowing you to run hundreds of staggeringly slow doom instances simultaneously on a GPU.

It would be even better if each of the e.g. 1k instances also ran full networking, so you could have 1024 player splitscreen on the GPU.

But the faulting mov instruction is the last instruction in the program. There are no instructions after it to speculatively execute.

One of the main ideas in OoO processors is that the processors are in many ways like dataflow machines, waiting for inputs to become available in order to start (potentially speculative) execution.

So, encoding logic in dependent movs does not inherently address the cache side channel issue.

History based branch prediction one difference with with explicit branches of course.

Uses movfuscator, one of the most painful tools when you're faced with reverse engineering. Also one of the most fun, from the other side of the aisle.

Hopefully this isn't too off topic, but could you explain how to even start approaching this from a RE/malware analysis perspective? I'm guessing there's no drag-and-drop de-obfuscate tool like there is for some of the common .NET obfuscators. Do you just rely on behavioral/dynamic methods?

For even the nastiest of obfuscators there are often attempts at deobfuscators...


I was just looking to replace a specific function call, because the company weren't using VC, and no longer had a working codebase.

So I was hunting a particular pattern to replace, that I knew probably existed. A simple combination of demovfuscator, regex on the "original" asm and eyeballing it succeeded in the end.

Has it ever been used against reverse engineering in the wild? Its lack of compiler features (it's based on lcc IIRC) and the sheer slowness of any operation makes it quite unsuitable for production usage.

In my case, it's sort of a yes and no. I reverse engineered partly, but I was only looking to replace whole function calls.

Company I came in as a contractor to... Sort of repair codebases and retrain staff, had a product they were getting paid buckets to support.

Unfortunately, someone had broken the codebase quite severely, and they weren't using version control. All we had was a movfuscated release (yay for paranoid managers who go to tech conferences), but needed to fix a certain feature ASAP.

So I used demovfuscator, someone's memory of what the code once looked like, and some ASM know-how to tear out references to the old and inject the new. Took a couple months of going nowhere fast, but I was pretty far out of my depth.

They don't use movfuscator anymore.

Every time I see some one say no version control I do a double take. Hopefully you managed to convince them to start it?

... No... The best I could convince was a regular rsync backup at end of day... You win some, you lose some.

To be honest, rsync sounds like a big win in that group..sheesh.

Good work all around.

The comment you're referring to implies yes, and it's often applied to very specific pieces of code instead of the entire program.

I have also seen a similar obfuscation, although perhaps not using The Movfuscator, many years ago in some shareware. You could probably guess which parts of the code used it; it even looks very distinctive when you open the binary in a text editor for a quick glance.

Apparently someone started writing a decompiler for movfuscator that looked for the patterns of movs generated. Probably possible to obfuscate past that still though with an extra layer of indirection whereby all instructions generate the same basic pattern of movs, with one small difference.

i love the design of the movfuscator. Most obfuscation tools tries to add more complexity and branching, in the hope that it will fool the decompiler. But it turns out, having absolutely no branching, and no 'features' makes it so much more of a nightmare to reverse!

I think you can see a similar trend in estoric languages.

Malboge is so complicated that nobody can really write a working program.

Brainfuck is so simple that nearly anybody can write it, but reverse-engineering it is a massive time-sink.

And Tigress is even more painful...

Finally I can play my favourite game without worrying about how Intel has fucked me!

You could run it on a mid-90's CPU without speculative branching and very likely see better performance.

486 or Pentium 90, what this game was made for.

Yea, either one of those would run the game better than this project/slideshow.

Or, alternatively, an x86 emulator on a more trustworthy architecture such as RISC-V, or an x86 implementation on an FPGA.

This is a more interesting idea than I would have thought at first. I wonder if it could be used to discover whether any frame-perfect exploits still remain unfound in Doom. I know that there are people who have played the game frame-by-frame in order to identify possible advantages, this is an interesting take on the idea. The one concern I have is this: does it change the enemy logic significantly? Will the enemies still react in roughly the same way?

I also wonder if it would be at all possible to eventually speed it up (maybe not to actual play speed, but faster) while remaining branchless and free of speculation. It might be interesting to see if research in Doom could provide faster and more secure solutions in the real world.

I was disappointed to find out the screenshot is not a GIF.

You may have to wait 7 hours to confirm that.

Or right-click and "save image as..." That works too.

“The mov-only DOOM renders approximately one frame every 7 hours, so playing this version requires somewhat increased patience.”


That reminds me of playing C&C: Red Alert on a 486/66, I though it was a deliberate strat game.

That was until I installed it on my dad's Pentium 200, the speed increased 3x and I realized it was a completely different beast.

Would be interesting to make this a sort of twitch plays. 7 hour voting windows.

I think it only registers input every 5 or so frames anyway :O

This needs blockchain GPU mining!

Please don't give anyone any ideas.

Another great use case for quantum computing.

As a veteran of early Origin games, I'm up for that. I ran Ultima VII on a DoubleSpaced 386.

Man, I lol'd when I read that line. That was amazing!

Ridiculous. Why does it become so slow though?

The mov-only DOOM renders approximately one frame every 7 hours, so playing this version requires somewhat increased patience.


Well, the compiler works!

This is excellent work.


It's neat, and rather funny.

To see an unmodified real-world C program run on a real-world hardware architecture, completely branchless, is pretty damn neat.

The way it takes hours to render a frame of DOOM is hilarious.

I wonder if the compiler will be optimised in future.

> unmodified

You have to patch the source though...

Oops, you're right! I thought it was only the build system that had to be adjusted.

About half of the code-changes are just to avoid alloca.

Some ask, "why". Others ask, "why not"...

I always liked that quote. In full:

I hear you say "Why?" Always "Why?" You see things; and you say "Why?" But I dream things that never were; and I say "Why not?"


"Science isn't about 'why', it's about 'why not'! Why don't you marry safe science if you love it so much!" --Cave Johnson

It says why right in the compiler FAQ — "I thought it would be funny"

Just being interesting is enough.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact