Hacker News new | comments | ask | show | jobs | submit login
Movfuscator – Single instruction C compiler (github.com)
197 points by acheron 23 days ago | hide | past | web | favorite | 60 comments



Of course, no post like this is complete without OISC:

https://en.wikipedia.org/wiki/One_instruction_set_computer

(Then, there's my contribution: NISC -- the Null Instruction Set Computer! It's a revolutionary architecture, in that die sizes and TDP can be scaled down, seemingly arbitrarily, with no loss in functionality. In fact, NISC can transcend the solid state entirely, and be implemented directly on the quantum foam of the vacuum!

Another benefit: NISC machines are entirely immune to all buffer overflow and use after free exploits! In fact, they are free of all exploits, whatsoever!)


Someone made a NISC compiler: https://github.com/jbangert/trapcc


Well, we start running one instruction, it just never quite finishes. The other OISC systems run way more than one instruction, they’re just always the same one with different parameters.


Although that's funny, NISC is already a thing:

https://en.wikipedia.org/wiki/No_instruction_set_computing


Nope, it's trivial to DoS them. Attackers don't even have to do anything.


Nope, it's trivial to DoS them. Attackers don't even have to do anything.

That's just the "wrong" spin! NISC computers exhibit the same performance characteristics under DDOS attacks of immense scale. Increase the scale of the DDOS attack, and the performance graph remains flat! And by flat, I don't mean with a small slope or slight wobbles. I mean completely flat!


If the attackers can't do anything how will they DoS them? Wait I think I understand now how it can be implemented on the directly on the quantum foam of the vacuum.


It doesn't matter that they can't do anything if they don't need to do anything. Things attackers can't do only matter if they'd need to do them.


Newsflash: NISC system weathers huge DDOS attack, with no discernible effects!

> Things attackers can't do only matter if they'd need to do them.

BUT, by the same token, things attackers can do don't matter either! NISC computers: So advanced, they make all attackers meaningless!


Best part of the project is at the bottom, in the FAQ:

> Q: Why did you make this? A: I thought it would be funny.


This is real hacker news.


Without taking anything away from the author, calling MOV a "single instruction" stretches things quite a bit. It's more of a big family of instructions that happen to share an assembly mnemonic.


MOV a "single instruction" stretches things quite a bit. It's more of a big family of instructions that happen to share an assembly mnemonic.

But if you look at the serious attempts to implement OISC, you'll find that the "single instruction" has so many parameters, some might think it might as well be a big family of instructions.


new OISC instruction: RUN <x86 instruction you want to run> <parameters>


That was like one of my professor's version of Object Oriented programming. He would simply message pass to his undergrad assistant, Mark. "Mark, code up an X."


Reverse Substract and Skip if Borrow (RSSB) uses only one parameter and is supposed to be turing complete. I havent understood how it works, to be honest.


This is completely fair, which is why I said not to take "anything away from the author" =)


I wasn't interpreting your comment as a criticism. I was introducing background information, which also shouldn't be interpreted as a criticism.


"No wireless. Less space than a nomad. Lame." -- CmdrTaco

Yea this isn't a true OISC setup, but it's very close and still completely impressive that this technique even works. That you can influence all necessary registers like this and accomplish arithmetic isn't immediately obvious.


My comment should not be interpreted as criticism.


I'd recommend the video presentation linked off the page, where he walks through how he actually turned mov into a machine. It's very weird and interesting!

https://www.youtube.com/watch?v=R7EEoWg6Ekk


I love how he nonchalantly throws in a 0-day x86 PoC in the end.


So as there's no branch prediction, this is good defense against spectre type attacks?


For a certain value of "good", yes.

> This is thought to be entirely secure against the Meltdown and Spectre CPU vulnerabilities, which require speculative execution on branch instructions.

> The mov-only DOOM renders approximately one frame every 7 hours, so playing this version requires somewhat increased patience.

https://github.com/xoreaxeaxeax/movfuscator/tree/master/vali...


Should be good enough for console players.


I can imagine some timing-specific attacks for memory accesses, but they're not likely as robust as attacks against the branch-predictor:

1. This is the simplest one - if the memory being accessed is in a cache (L1/L2, or page in TLB), the function will take a significantly shorter time to execute. If movfuscator achieves conditional execution by manipulating index registers to perform idempotent operations, this will be very easy to detect.

2. Prefetching - if movfuscator reads memory sequentially with a detectable stride, prefetching will shorten the execution time.

3. Write combining - if the code writes to nearby addresses (same cache line), the CPU will combine them to a single write. This will cause a measurable timing difference.

EDIT: One more: Store forwarding - if the code writes to a memory address and reads it soon, the CPU may bypass the memory access (and even cache access) completely.


This is a defense against spectre type attacks, but this cannot be a "good" defense because it sacrifices too much. The programs written this way are assuredly quite slow.


in a similar vein, here is an exmple program that runs using only x86 CPU faults.

https://github.com/jbangert/trapcc



Who would win: the world's most popular CISC architecture or one MOVey boi?


This is awesome and it gets posted periodically. Previous discussion:

https://news.ycombinator.com/item?id=12372242


On a slight tangent, I remember minimal lambda languages like Iota, Jot and Zot that I think were derived from Unlambda.


Ha! This is actually pretty darn clever!

I wonder what the final binary sizes are in comparison to normal GCC compilation, as well as execution speed.

I can't believe you made a floating point emulator just for this as well! Quite impressed.


Considering that the floating point library is big enough to be disabled by default tells me you might run out of disk space compiling a single application.

That being said, if this can increase difficulty of sidechannel attacks and branch prediction, maybe it'd actually make sense for very isolated parts of some services.


>That being said, if this can increase difficulty of sidechannel attacks and branch prediction, maybe it'd actually make sense for very isolated parts of some services.

I agree, I can see how it certain special circumstances/services this could be very useful.

However, I still expect someone to compile Quake with it by the end of the year.


They've apparently done DOOM already, and it's hours per frame. I don't want to think what Quake would be like.


To be clear, I (the HN submitter) have nothing to do with this, I just found it and thought it was great.


From the example images of original and obfuscated assembly, there's seems to be about a sixfold increase in the number of instructions (for that example, at least), so I imagine you'll see a similar binary size increase.


Why haven't Single Instruction-Set architectures taken off?

If mov is Turing Complete it seems like there'd be a big win here... You could parallelize this massively.

Edit: can someone explain why this is being down voted please, because this is a legit question


They’re impractical and massively inefficient?


Okay?

I'm asking because I don't know; can you elaborate on why that is?

Aren't most modern GPUs similar in that they're designed to just shit out triangles as fast as possible, massively parallel?


Basically, you're going to try to emulate other instructions that you don't have with this one instruction, and that's not going to perform very well because now, instead of many optimized instructions, you have strings of this one instruction in its place. And I don't see any way to parallelize this: you're doing the same thing you always were, just with a bunch more code.


> "can you elaborate on why that is?"

Might as-well suggest we use a single letter, instead of 26.


     .....   ..  ..
    ..   ..  .. ..
    ..   ..  ....
    ..   ..  .. ..
     .....   ..  ..
The first number is zero, right? Then the first letter of the alphabet is well its hard to show because it doesn't print. It's just an empty set, nothing, a stop bit. We actually do use only 1 bit in digital cpus, but a weird mix of analogue in broad band transmission. I wonder why cpus don't use ternary or whatever. But I wonder why asynchronous CPUs didn't take off, so don't mind me, just being bored.


.... --- .-- .- -... --- ..- - - .-- ---?


............... .............. .....

............. ......... ....... ........ ....................

....................... ............... ................. ...........


No, modern GPUs run many of the same instructions as CPUs. They have branches and everything. Their main limitation is that groups of threads are bundled together (called warps) and share a program counter, so lots of branching can result in a lot of wasted work if the threads disagree on which branch to take. That, and there's a huge penalty you pay for moving data across the bus to GPU RAM.


few problems are easily parallelizable. that said, that's not even the issue here. specialized instruction may be emulated by movs, but the speed loss could never be recouped even by massive parallelization.


The problem is probably the address space that movs use, instead of specialized registers with optimized pipelining. But internally, many instructions might actually come down to conditional moves. I guess that's either after the microcode is decoded, or if I guessed wrong about that, then Register Transfer Logik still pretty much sounds like it was based on, well, transfers.


Why would it make sense to use a single instruction?

(Also no, GPUs are not the same at all)


You can perform multiplication by repeated addition, but that is a very inefficient way to multiply. It's the same thing here, where you can replace other instructions with MOV, but the replacement is much slower than the original.


What makes you think this would be easier to parallelize than a traditional application? Just because there is only one kind of instruction used doesn't mean they don't still have to come in the right order!


What makes you think this would be easier to parallelize than a traditional application?

Exactly such an idea was proposed for parallel signal processing quite a while back, actually. Look up One Instruction Set Computer.


I don't see any material online indicating that programs written for one-instruction-set-computers are more parallelizable than programs written for traditional computers. In fact, here is someone claiming the opposite:

> The disadvantage of an MISC is that instructions tend to have more sequential dependencies, reducing overall instruction-level parallelism.

https://ipfs.io/ipfs/QmXoypizjW3WknFiJnKLwHCnL72vedxjQkDDP1m...


Also note that the idea didn't take off.


If mov is Turing Complete it seems like there'd be a big win here... You could parallelize this massively.

Actually, One Instruction Set Computers were proposed for this very application! (Also mentioned elsewhere, but is applicable in a serious way here.)

https://en.wikipedia.org/wiki/One_instruction_set_computer


At first look, the code produced by Movfuscator is mind boggling. But despite its perceived entanglement there exist some deobfuscators that bring the code back to Harvard (e.g. human readable) model.

TL/DR obfuscation by translating a program to "mov" instructions is susceptible to static analysis and thus fully reversible.


> there exist some deobfuscators that bring the code back

Care to elaborate? The author of Movfuscator is very experienced and capable at reverse engineering. In one of his video talks he hits some Movfuscated code with some tools and says he doesn't know of anything that can deobfuscate it. That may have changed since then -- I'd be curious to know.


There's the demovfuscator: https://kirschju.re/demov


Thanks! That’s really cool.




Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: