Hacker News new | past | comments | ask | show | jobs | submit login
Hiding messages in x86 binaries using semantic duals (yossarian.net)
169 points by todsacerdoti on Aug 16, 2020 | hide | past | favorite | 55 comments



I've been reverse-engineering the 8086 lately, and I recently came across the exact register-control circuitry that makes this steganography possible.

The steganography takes advantage of x86 instructions where you can swap the source and destination by replacing an instruction like hex 31 c0 with 33 c0. The difference is bit 1, the direction bit, supported by many instructions.

Internally, the 8086 has 5-bit registers that specify the source and destination, typically a register. [1] The source and destination register get their value either from the register specification in the instruction (bits 5-3 or bits 0-2), or values in the microcode.

The clever part is the outputs of the source and destination registers go through multiplexers that can swap them. If the instruction has a direction bit, and the direction bit is set, then accesses to the source and destination registers are swapped.

The point of this is that the microcode doesn't know anything about direction swapping, so it is implemented "for free" as far as microcode size (but with the addition of the swapping circuit).

The 8086 has a "Group PLA" that categorizes instructions into groups; one of these groups is "instructions that have a direction bit". This prevents direction swapping from happening for instructions where it is not supported.

I hope this explanation makes sense; it should probably be a blog post :-)

[1] You might wonder why source and destination are specified with 5 bits when the instructions use 3 bits to specify the register. The first reason is that many registers can be accessed as half-registers, so you need another bit. (This bit comes from the byte/word specification bit in instructions.) Second, this mechanism is used to access the other 8086 registers, not just the general-purpose registers. Third, there are also invisible temporary registers that also need accessing. Thus, the internal register specifications are 5 bits.


This is a fantastic explanation, thank you! I would love a detailed blog post on it.


Thanks for writing this up Ken.

Strangely enough, I was researching a related topic a few days ago. I was wondering if disassemblers did the right thing with these opcodes. I was wondering in particular if they could reproduce the exact binary after disassembly.

It turns out that most assemblers do not have a specific way to make a distinction. The gnu assembler (gas) does have a ‘.s’ suffix for mnemonics to differentiate.


By the way, since you asked what people would like reading about the 8086, I would greatly enjoy an article focusing on its microcode, both what it implements and how it's processed (not necessarily in one article).


It might be worth noting that strictly speaking this isn't safe to apply to an arbitrary binary. That is 31 c0 and 33 c0 might behave identically when executed as xor, but they could, for example, also be part of a larger instruction where the swap results in a change in behavior.

Presumably this implementation disassembles the binary which gives a very high probability of determining whether these bytes are only executed as xor, but this isn't in general enough since the way the program is executed at runtime may not correspond to disassembly [0].

Other patterns that might trip this up are programs that re-use some program bytes as constants (e.g., if you needed the 16-bit constant 0xc031 you could just point to this existing pattern in the instruction stream [1]) or which otherwise examine or modify their instruction bytes at runtime.

Now of course this caveat doesn't apply to almost any "vanilla program" compiled by a normal compiler. It's only likely to come up in hand-written assembly or as a result of some tool or process that purposes does this weird stuff (e.g., as an anti-debugging measure). 99.99% of the time these swaps will work out fine.

---

[0] Further, you can't even necessarily unambiguously assign a byte to a particular instruction, even based on the dynamic behavior, since a given byte might be used in two different instructions based on an earlier jump which result in different parsed boundaries. This is really really unusual and usually in the realm of demos or anti-debugging techniques, etc.

[1] This is not actually a good idea for performance because modern CPUs hate it when you use the same bytes for both code and data.


> It might be worth noting that strictly speaking this isn't safe to apply to an arbitrary binary. That is 31 c0 and 33 c0 might behave identically when executed as xor, but they could, for example, also be part of a larger instruction where the swap results in a change in behavior.

> Presumably this implementation disassembles the binary which gives a very high probability of determining whether these bytes are only executed as xor, but this isn't in general enough since the way the program is executed at runtime may not correspond to disassembly [0].

Yep: steg86 uses iced[1] internally to decode and re-encode instruction sequences. Arbitrarily replacing `31 C0` with `33 C0` (or vice versa) in the instruction text would certainly not go well.

And yes: it's possible to contrive pathological programs that make these kinds of patches impossible to perform safely. But those fall under the "what did you expect" support category, at least in terms of the current implementation. A compiler-based implementation would certainly have an easier time.

[1]: https://github.com/0xd4d/iced


Yes, I think the most practical scenarios this would fail in the real world would be:

1) Signed binaries or other similar concepts such as "anti-cheat" stuff that tries to detect binary modification.

2) Code that actually does do really weird stuff in order to fool static disassembly, e.g., obfuscation which is not uncommon in games and some other binary types.

Of course, if you created the binary yourself, you'd be aware of all these gotchas, so this would only come as a surprise if you were applying it as a "third party" to some arbitrary binary.


>It might be worth noting that strictly speaking this isn't safe to apply to an arbitrary binary. That is 31 c0 and 33 c0 might behave identically when executed as xor, but they could, for example, also be part of a larger instruction where the swap results in a change in behavior.

Technically speaking, you can't apply any transformation to arbitrary binaries, because then the encoding changes and the code can look back at itself. This actually features heavily in a recent Hacker News article https://news.ycombinator.com/item?id=23557998 about reverse engineering the Snapchat app, where the code regularly took checksums of itself to frustrate debuggers and take different codepaths if tampering (such as to insert breakpoints) was detected.

Assembly is kind of, in a sense, the ultimate homoiconic programming language.


I tried to cover that in the catchall "[...] or which otherwise examine [...] their instruction bytes at runtime".

> Technically speaking, you can't apply any transformation to arbitrary binaries

Well in some cases it may be possible to prove that a binary doesn't do anything weird (given the entry point which is usually fixed by convention, e.g., _start), although that is undecidable in general.


This seems like it'd be almost comically easy to detect, because normal compilers and assemblers would consistently use either 31 C0 or 33 C0 all the time, rather than bouncing back and forth between them within a single program.


Author here: it is indeed easy to detect. I didn't base my work off of Hydan but it uses a similar strategy, and there are several[1][2] public steganalyses of it.

Edit: I previously claimed that this method might be easy to deny. It isn't, and I've added a note to the post as well.

[1]: https://cosec.inf.uc3m.es/~juan-tapiador/papers/2009sec.pdf

[2]: https://www.sans.org/reading-room/whitepapers/stenganography...


It's not easy to deny. Normal compilers flat out don't produce binaries that look like this, so if a binary looks like this then you definitely know something fishy is going on even if you can't decrypt it.

Contrast with steganography in least significant image bits; you genuinely cannot determine at all if an image is carrying hidden encrypted data. That's what easy to deny looks like.


> It's not easy to deny. Normal compilers flat out don't produce binaries that look like this, so if a binary looks like this then you definitely know something fishy is going on even if you can't decrypt it.

Yes, you're right. I've updated the post to include a note at the bottom saying that this technique is difficult to provide deniability with.

> Contrast with steganography in least significant image bits; you genuinely cannot determine at all if an image is carrying hidden encrypted data. That's what easy to deny looks like.

This might be a misunderstanding on my part, but I thought that LSB-style steganography had been broken by both statistical analyses and ML models for a while now[1]. But it's possible that these methods only work on plaintext messages; I haven't looked deeply into it.

[1]: https://github.com/b3dk7/StegExpose


You are right, in the sense that there are companies out there (I used to work for one of them) that do image, video and audio watermarking. In the simplest terms that's also adding imperceptible "noise" that statistically builds up to a few bits of data.


I am not sure to understand how the program can make a distinction between 33 C0 (because it is written as 33 C0) and 33 C0 (because it was written as 31 C0 and was later changed by steg86 to 33 C0)?

Or, seen the other way, maybe steg86 can "extract" (from already existing untouched binaries) secret messages that were never intentionally written?


This is a really good question!

steg86 currently embeds a 32-bit header for itself. You can see the relevant constants here[1]. If the header doesn't validate during extraction, steg86 fails instead of extracting potential garbage.

[1]: https://github.com/woodruffw/steg86/blob/master/src/steg86/b...


OK, but how is the 32-bit header itself embedded?

The header is a good thing for preventing accidental extraction from binaries not treated with steg86, but still you need to modify 4 bytes (or 4 instructions) to embed this header, so - at least theorically - the possibility of a "collision" or of a false positive seems relatively high to me.


> OK, but how is the 32-bit header itself embedded?

The header is embedded according to the same rules as the rest of the message: it's treated as a bitstring, and flippable instructions are flipped appropriately to encode it.

> the possibility of a "collision" or of a false positive seems relatively high to me.

As others have pointed out, compilers (and assemblers) tend to stick to a single selection choice for register-to-register ops. Even in the case where a compiler writer flips between them randomly, their 32 random choices would have to align precisely with the expected header. It's certainly not impossible, but pretty unlikely. It's also completely remediable with a CRC32 or similar field tacked onto the header; I just didn't think the likelihood warranted that for the initial design.


4 bytes of the encoded header isn't 4 instructions, it's 32 consecutive instructions that each translate to a bit depending on their semantic form.

So there is virtually no chance for a compiler to generate a valid sequence randomly.


I see, each changed instruction gives a single bit (not byte) of information, my bad.


Are there other semantic duals in the instruction sets? If so, when combined with cryptography, it would offer some additional level of plausible deniability over which of the possible messages (due to parameter selection) is the cryptotext.


The two most obvious are:

- Register allocation choices

- Instruction ordering choices


Also conditional branches because you can conceivably do the comparison, branch, and fall-through in several different ways. Consider the initial case

  cmp ax,bx
  jge foo
  bar:
  ; code that gets run if ax < bx
  foo:
  ; code that gets run if ax >= bx
You could change this to cmp bx,ax and then change the jge to jl, or you could change either one alone and reverse the order of the bar and foo code blocks, or you could change both and not reverse the order of the bar and foo code blocks. (You have fewer choices if you are doing the comparison as the exit condition for a loop, except sometimes compilers will put the loop continue condition as an unconditional jump after bar or foo, in which case you have even more choices, like whether to do that or not!)


But that's not a fundamental issue of the idea and rather a quirk in the steg86 implementation, right?


Which part? The header is specific to steg86, and some of its constraints (like the maximum message size) are a product of that. Otherwise, there aren't any quirks in the implementation that I'm currently aware of.


The same problem is faced when you present basically any kind of encryption scheme with random data. It would be easy to fix that kind of problem by adding a checksum to the secret message.

Besides, in what scenario would you be decrypting arbitrary binaries with this, such that it would be a problem for you to have false positives? Just make sure to use it only on files which you know contain secret messages.


>Besides, in what scenario would you be decrypting arbitrary binaries with this, such that it would be a problem for you to have false positives? Just make sure to use it only on files which you know contain secret messages.

Well, more or less cryptography/steganography can be used to either store "secrets" or to communicate them.

If I used something like this to communicate, I would probably tell the other part to download (say) a .iso full of binaries, as opposed to a single binary.


I think from the perspective of steganography the question is also: How often is byte code checked for anything like this?

Steganography is like a magician's act; it's only undetected if you don't go and look for it. I think, in principle, if you are the only one using the tool, and didn't tell anyone, why would anyone look for it?


This particular trick is easy to detect when you're looking for it, but that doesn't mean that it's easy to realise that you should look for this trick (I'm not about to make "check the assembler for strange compiler choices" part of my workflow for installation of a new binary), or that all possible such tricks could be so easily caught.


Surely there is some way to verify if a build tempered with, cough file integrity hashes.


Which is a point made in the article - the technique is not meant to be hard to detect a la cryptography, it's meant to be non-obvious to spot. By exploiting the fact that nobody is going to look through (or edit) machine instructions for patterns in register bytes, you can include a message that is easy to overlook.


I remember that one assembler claimed to do something similar to sort of watermark the binaries. From what I remember it did it only in the otherwise fully functional unregistered version to encourage people to pay for it. From my memory it was NASM but I cannot find anything about it ever being paid for. Does anyone remember a shareware assembler from mid nineties that did this?

EDIT: Another comment already mentioned A86. Probably that was the one.



I can confirm. The author was an Intel engineer who was dissatisfied with how his senior engineers did the DOS EXE format in the intro of the docs of his assembler.

A86/D86 where fantastic, with extra bonus points for the docs.


Decades ago, I remember figuring out quite a bit of that, and here are my original notes from that, including tables that show the pattern clearly:

    MOV rA, rB:
         B: AL AH BL BH CL CH DL DH
    A:      1  1  0  0  1  1  0  0
    AL  1   ** ** 8A 8A ** ** 8A 8A
    AH  1   ** ** 8A 8A ** ** 8A 8A
    BL  0   8A 8A ** ** 8A 8A ** **
    BH  0   8A 8A ** ** 8A 8A ** **
    CL  1   ** ** 8A 8A ** ** 8A 8A
    CH  1   ** ** 8A 8A ** ** 8A 8A
    DL  0   8A 8A ** ** 8A 8A ** **
    DH  0   8A 8A ** ** 8A 8A ** **

    * = "reversed" opcode (88)

    MOV rA, rB ( word regs ):
         B: AX BX CX DX SP BP SI DI
    A:      1  0  1  0  1  1  0  0
    AX 1    ** 8B ** 8B ** ** 8B 8B
    BX 0    8B ** 8B ** 8B 8B ** **
    CX 1    ** 8B ** 8B ** ** 8B 8B
    DX 0    8B ** 8B ** 8B 8B ** **
    SP 1    ** 8B ** 8B ** ** 8B 8B
    BP 1    ** 8B ** 8B ** ** 8B 8B
    SI 0    8B ** 8B ** 8B 8B ** **
    DI 0    8B ** 8B ** 8B 8B ** **

    * = "reversed" opcode (89)

    The above tables apply for the following two-operand instructions:

        ADD             OR
        ADC             SBB
        AND             SUB
        XOR             CMP

    For TEST and XCHG, which are commutative, A86 always puts the first
    operand in the r/m field if possible, while MASM puts it in the reg
    field if the first operand is a register.


Wow! Back then, I was not quite sure if they had really implemented it or if they just claimed they had to encourage people to pay.

I would never have imagined that I'd get an answer to that question a quarter of a century later. Thanks for the comment!


It's been so long that I read that remark in the manual and I'm glad I wasn't just imagining it;-)


About ten years ago, the same question was raised on StackOverflow: https://stackoverflow.com/questions/2760794/x86-cmp-instruct...

A comment was made: “These 1-bit degrees of freedom also provide a covert channel for compilers to "phone home" - they can "watermark" the binaries they produce, and the compiler vendor can ask you to please explain if they find your software with their watermark, but with no license on file.” – Bernd Jendrissek


This would provide an interesting way of signing a binary. Encode using all 0s, then sign, then encode the signature


Cool project. I like to hide data in audio files with Deepsound https://deepsound.soft112.com/


I wrote a blog post on the weak cryptography used by Deepsound:

https://ryan.govost.es/2018/03/09/deepsound.html


That's a great writeup. Is it possible to create a really long passphrase whose hash can't be reversed easily? Perhaps a diceware passphrase with six randomly chosen words?


The difficulty of breaking Deepsound is basically equivalent to the difficulty of reversing a SHA-1 hash. For dictionary words and shorter passwords, consider them broken instantaneously through pre-computed lookup tables.

For more complex passphrases (and remember, only the first 32 characters count here), exponential growth probably works in your favor, even with today's Bitcoin-fueled hyper-accelerated SHA-1 implementations.

Even then, the scheme where they use the password directly as the AES key is flawed. For example, in ASCII, every octet's most-significant bit is zero, so 32 bits of your AES key are fixed. I don't know if this enables practical attacks, but anyone who cares about securing their data shouldn't rely on amateur cryptography like this.

Edit: Oh right, and aside from the password aspect, it uses ECB mode for the encrypted content. That’s not good.


For those who are curious about ECB: see the picture & encrypted picture of Tux on https://en.m.wikipedia.org/wiki/Block_cipher_mode_of_operati...


Just encrypt your data first before giving to DeepSound.


An old test suite (Plum Hall?) used to warn it was generated uniquely per-client, with steganographic watermarking at the C source level.


A86 reportedly did something similar: https://en.wikipedia.org/wiki/A86_(software)


Yes, I remember A86 documentation saying it did that to every program it assembled, to watermark that A86 was used to assemble it.

In the 80s. Blast from the past!


You could do stenography with ordering of function/data sections VS an expected permutation.


Does anyone know if other instruction sets like wasm have these kind of dual?


I think it is better to call this a noncommutative operation and phrase it as such. Duals are often in fact, different things.

But I think what you are doing is pretty cool! I don't know much about machine code, but whenever XOR becomes noncommutative, for whatever reason, but executes commutatively, then this should become possible.

Of course, by the way, XOR also has the vanilla ability to reveal information through: plaintext XOR cypher = cyphertext and then you do cyphertext XOR cypher to get back plaintext. In this case, XOR is commutative, as we are only looking at execution, and moreover each binary string is it's own inverse: thing XOR thing = 0-string. So, you can think of XOR as reversible sequences of encypherment.


I'm not sure what xor's commutativity has to do with this?


OP is treating XOR as though it were not commutative between registers and memory:

> Consequently, there are actually two ways to encode xor eax, eax:

    ; r/m32, r
    31 C0

    ; r, r/m32
    33 C0
So, what I am saying is that write(memory,register) != write(register,memory) in the byte code, meaning that writing the XOR itself is a noncommutative operation. The machine running the code does however treat both option ultimately to get the same result.

The XOR algorithm remains unchanged and remains commutative. The execution routine of the XOR is noncommutative.

So, there are two operations here: XOR and executeXOR and OPs article is about using the machine level noncommutativity in the form of different byte strings for executeXOR. The machine still thinks that is is commutative for the end result of the calculation/routine.

My comment about cypher XOR plaintext = cyphertext is just a general comment about XOR.


I think you might have misunderstood the modr/m encoding: steg86 never touches memory-register or register-memory operations, only register-register ones. Nothing about this either requires or implies commutativity.


Alright, but the principle still holds:

You have some action act(x,y). The encoding in the bitstream can either differ for act(x,y) or act(y,x) or it is the same.

The original article states that the effect of act(x,y) and act(y,x) is the same. But the bitstream may differ. So it is commutative in the operations of the computer, but not in the bitstream instruction.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: