
Hiding messages in x86 binaries using semantic duals - todsacerdoti
https://blog.yossarian.net/2020/08/16/Hiding-messages-in-x86-binaries-using-semantic-duals
======
kens
I've been reverse-engineering the 8086 lately, and I recently came across the
exact register-control circuitry that makes this steganography possible.

The steganography takes advantage of x86 instructions where you can swap the
source and destination by replacing an instruction like hex 31 c0 with 33 c0.
The difference is bit 1, the direction bit, supported by many instructions.

Internally, the 8086 has 5-bit registers that specify the source and
destination, typically a register. [1] The source and destination register get
their value either from the register specification in the instruction (bits
5-3 or bits 0-2), or values in the microcode.

The clever part is the outputs of the source and destination registers go
through multiplexers that can swap them. If the instruction has a direction
bit, and the direction bit is set, then accesses to the source and destination
registers are swapped.

The point of this is that the microcode doesn't know anything about direction
swapping, so it is implemented "for free" as far as microcode size (but with
the addition of the swapping circuit).

The 8086 has a "Group PLA" that categorizes instructions into groups; one of
these groups is "instructions that have a direction bit". This prevents
direction swapping from happening for instructions where it is not supported.

I hope this explanation makes sense; it should probably be a blog post :-)

[1] You might wonder why source and destination are specified with 5 bits when
the instructions use 3 bits to specify the register. The first reason is that
many registers can be accessed as half-registers, so you need another bit.
(This bit comes from the byte/word specification bit in instructions.) Second,
this mechanism is used to access the other 8086 registers, not just the
general-purpose registers. Third, there are also invisible temporary registers
that also need accessing. Thus, the internal register specifications are 5
bits.

~~~
woodruffw
This is a fantastic explanation, thank you! I would _love_ a detailed blog
post on it.

------
BeeOnRope
It might be worth noting that _strictly speaking_ this isn't safe to apply to
an arbitrary binary. That is 31 c0 and 33 c0 might behave identically when
executed as xor, but they could, for example, also be part of a larger
instruction where the swap results in a change in behavior.

Presumably this implementation disassembles the binary which gives a very high
probability of determining whether these bytes are _only_ executed as xor, but
this isn't in general enough since the way the program is executed at runtime
may not correspond to disassembly [0].

Other patterns that might trip this up are programs that re-use some program
bytes as constants (e.g., if you needed the 16-bit constant 0xc031 you could
just point to this existing pattern in the instruction stream [1]) or which
otherwise examine or modify their instruction bytes at runtime.

Now of course this caveat doesn't apply to almost any "vanilla program"
compiled by a normal compiler. It's only likely to come up in hand-written
assembly or as a result of some tool or process that purposes does this weird
stuff (e.g., as an anti-debugging measure). 99.99% of the time these swaps
will work out fine.

\---

[0] Further, you can't even necessarily unambiguously assign a byte to a
particular instruction, even based on the dynamic behavior, since a given byte
might be used in two _different instructions_ based on an earlier jump which
result in different parsed boundaries. This is _really really_ unusual and
usually in the realm of demos or anti-debugging techniques, etc.

[1] This is not actually a good idea for performance because modern CPUs hate
it when you use the same bytes for both code and data.

~~~
woodruffw
> It might be worth noting that strictly speaking this isn't safe to apply to
> an arbitrary binary. That is 31 c0 and 33 c0 might behave identically when
> executed as xor, but they could, for example, also be part of a larger
> instruction where the swap results in a change in behavior.

> Presumably this implementation disassembles the binary which gives a very
> high probability of determining whether these bytes are only executed as
> xor, but this isn't in general enough since the way the program is executed
> at runtime may not correspond to disassembly [0].

Yep: steg86 uses iced[1] internally to decode and re-encode instruction
sequences. Arbitrarily replacing `31 C0` with `33 C0` (or vice versa) in the
instruction text would certainly not go well.

And yes: it's possible to contrive pathological programs that make these kinds
of patches impossible to perform safely. But those fall under the "what did
you expect" support category, at least in terms of the current implementation.
A compiler-based implementation would certainly have an easier time.

[1]: [https://github.com/0xd4d/iced](https://github.com/0xd4d/iced)

~~~
BeeOnRope
Yes, I think the most practical scenarios this would fail in the real world
would be:

1) Signed binaries or other similar concepts such as "anti-cheat" stuff that
tries to detect binary modification.

2) Code that actually does do really weird stuff in order to fool static
disassembly, e.g., obfuscation which is not uncommon in games and some other
binary types.

Of course, if you created the binary yourself, you'd be aware of all these
gotchas, so this would only come as a surprise if you were applying it as a
"third party" to some arbitrary binary.

------
josephcsible
This seems like it'd be almost comically easy to detect, because normal
compilers and assemblers would consistently use either 31 C0 or 33 C0 all the
time, rather than bouncing back and forth between them within a single
program.

~~~
woodruffw
Author here: it is indeed easy to detect. I didn't base my work off of Hydan
but it uses a similar strategy, and there are several[1][2] public
steganalyses of it.

Edit: I previously claimed that this method might be easy to deny. It isn't,
and I've added a note to the post as well.

[1]: [https://cosec.inf.uc3m.es/~juan-
tapiador/papers/2009sec.pdf](https://cosec.inf.uc3m.es/~juan-
tapiador/papers/2009sec.pdf)

[2]: [https://www.sans.org/reading-
room/whitepapers/stenganography...](https://www.sans.org/reading-
room/whitepapers/stenganography/paper/32839)

~~~
jaclaz
I am not sure to understand how the program can make a distinction between 33
C0 (because it is written as 33 C0) and 33 C0 (because it was written as 31 C0
and was later changed by steg86 to 33 C0)?

Or, seen the other way, maybe steg86 can "extract" (from already existing
untouched binaries) secret messages that were never intentionally written?

~~~
woodruffw
This is a really good question!

steg86 currently embeds a 32-bit header for itself. You can see the relevant
constants here[1]. If the header doesn't validate during extraction, steg86
fails instead of extracting potential garbage.

[1]:
[https://github.com/woodruffw/steg86/blob/master/src/steg86/b...](https://github.com/woodruffw/steg86/blob/master/src/steg86/binary.rs#L12-L24)

~~~
jaclaz
OK, but how is the 32-bit header itself embedded?

The header is a good thing for preventing accidental extraction from binaries
not treated with steg86, but still you need to modify 4 bytes (or 4
instructions) to embed this header, so - at least theorically - the
possibility of a "collision" or of a false positive seems relatively high to
me.

~~~
paraboul
4 bytes of the encoded header isn't 4 instructions, it's 32 consecutive
instructions that each translate to a bit depending on their semantic form.

So there is virtually no chance for a compiler to generate a valid sequence
randomly.

~~~
jaclaz
I see, each changed instruction gives a single bit (not byte) of information,
my bad.

------
weinzierl
I remember that one assembler claimed to do something similar to sort of
watermark the binaries. From what I remember it did it only in the otherwise
fully functional unregistered version to encourage people to pay for it. From
my memory it was NASM but I cannot find anything about it ever being paid for.
Does anyone remember a shareware assembler from mid nineties that did this?

EDIT: Another comment already mentioned A86. Probably that was the one.

~~~
jlmcgraw
:You’re thinking of A86:

[https://en.m.wikipedia.org/wiki/A86_(software)](https://en.m.wikipedia.org/wiki/A86_\(software\))

~~~
userbinator
Decades ago, I remember figuring out quite a bit of that, and here are my
original notes from that, including tables that show the pattern clearly:

    
    
        MOV rA, rB:
             B: AL AH BL BH CL CH DL DH
        A:      1  1  0  0  1  1  0  0
        AL  1   ** ** 8A 8A ** ** 8A 8A
        AH  1   ** ** 8A 8A ** ** 8A 8A
        BL  0   8A 8A ** ** 8A 8A ** **
        BH  0   8A 8A ** ** 8A 8A ** **
        CL  1   ** ** 8A 8A ** ** 8A 8A
        CH  1   ** ** 8A 8A ** ** 8A 8A
        DL  0   8A 8A ** ** 8A 8A ** **
        DH  0   8A 8A ** ** 8A 8A ** **
    
        * = "reversed" opcode (88)
    
        MOV rA, rB ( word regs ):
             B: AX BX CX DX SP BP SI DI
        A:      1  0  1  0  1  1  0  0
        AX 1    ** 8B ** 8B ** ** 8B 8B
        BX 0    8B ** 8B ** 8B 8B ** **
        CX 1    ** 8B ** 8B ** ** 8B 8B
        DX 0    8B ** 8B ** 8B 8B ** **
        SP 1    ** 8B ** 8B ** ** 8B 8B
        BP 1    ** 8B ** 8B ** ** 8B 8B
        SI 0    8B ** 8B ** 8B 8B ** **
        DI 0    8B ** 8B ** 8B 8B ** **
    
        * = "reversed" opcode (89)
    
        The above tables apply for the following two-operand instructions:
    
            ADD             OR
            ADC             SBB
            AND             SUB
            XOR             CMP
    
        For TEST and XCHG, which are commutative, A86 always puts the first
        operand in the r/m field if possible, while MASM puts it in the reg
        field if the first operand is a register.

~~~
weinzierl
Wow! Back then, I was not quite sure if they had really implemented it or if
they just claimed they had to encourage people to pay.

I would never have imagined that I'd get an answer to that question a quarter
of a century later. Thanks for the comment!

------
praalhans
About ten years ago, the same question was raised on StackOverflow:
[https://stackoverflow.com/questions/2760794/x86-cmp-
instruct...](https://stackoverflow.com/questions/2760794/x86-cmp-instruction-
difference)

A comment was made: “These 1-bit degrees of freedom also provide a covert
channel for compilers to "phone home" \- they can "watermark" the binaries
they produce, and the compiler vendor can ask you to please explain if they
find your software with their watermark, but with no license on file.” – Bernd
Jendrissek

------
steamraven
This would provide an interesting way of signing a binary. Encode using all
0s, then sign, then encode the signature

------
jermier
Cool project. I like to hide data in audio files with Deepsound
[https://deepsound.soft112.com/](https://deepsound.soft112.com/)

~~~
rgovostes
I wrote a blog post on the weak cryptography used by Deepsound:

[https://ryan.govost.es/2018/03/09/deepsound.html](https://ryan.govost.es/2018/03/09/deepsound.html)

~~~
jermier
That's a great writeup. Is it possible to create a really long passphrase
whose hash can't be reversed easily? Perhaps a diceware passphrase with six
randomly chosen words?

~~~
rgovostes
The difficulty of breaking Deepsound is basically equivalent to the difficulty
of reversing a SHA-1 hash. For dictionary words and shorter passwords,
consider them broken instantaneously through pre-computed lookup tables.

For more complex passphrases (and remember, only the first 32 characters count
here), exponential growth probably works in your favor, even with today's
Bitcoin-fueled hyper-accelerated SHA-1 implementations.

Even then, the scheme where they use the password directly as the AES key is
flawed. For example, in ASCII, every octet's most-significant bit is zero, so
32 bits of your AES key are fixed. I don't know if this enables practical
attacks, but anyone who cares about securing their data shouldn't rely on
amateur cryptography like this.

Edit: Oh right, and aside from the password aspect, it uses ECB mode for the
encrypted content. That’s not good.

~~~
Beldin
For those who are curious about ECB: see the picture & encrypted picture of
Tux on
[https://en.m.wikipedia.org/wiki/Block_cipher_mode_of_operati...](https://en.m.wikipedia.org/wiki/Block_cipher_mode_of_operation)

------
082349872349872
An old test suite (Plum Hall?) used to warn it was generated uniquely per-
client, with steganographic watermarking at the C source level.

~~~
tom_
A86 reportedly did something similar:
[https://en.wikipedia.org/wiki/A86_(software)](https://en.wikipedia.org/wiki/A86_\(software\))

~~~
jlokier
Yes, I remember A86 documentation saying it did that to every program it
assembled, to watermark that A86 was used to assemble it.

In the 80s. Blast from the past!

------
crb002
You could do stenography with ordering of function/data sections VS an
expected permutation.

------
steamraven
Does anyone know if other instruction sets like wasm have these kind of dual?

------
mikorym
I think it is better to call this a noncommutative operation and phrase it as
such. Duals _are_ often in fact, different things.

But I think what you are doing is pretty cool! I don't know much about machine
code, but whenever XOR becomes noncommutative, for whatever reason, but
_executes_ commutatively, then this should become possible.

Of course, by the way, XOR also has the vanilla ability to reveal information
through: plaintext XOR cypher = cyphertext and then you do cyphertext XOR
cypher to get back plaintext. In this case, XOR is commutative, as we are only
looking at execution, and moreover each binary string is it's own inverse:
thing XOR thing = 0-string. So, you can think of XOR as reversible sequences
of encypherment.

~~~
saagarjha
I'm not sure what xor's commutativity has to do with this?

~~~
mikorym
OP is treating XOR as though it were not commutative between registers and
memory:

> Consequently, there are actually two ways to encode xor eax, eax:
    
    
        ; r/m32, r
        31 C0
    
        ; r, r/m32
        33 C0
    

So, what I am saying is that write(memory,register) != write(register,memory)
in the byte code, meaning that writing the XOR itself is a noncommutative
operation. The machine running the code does however treat both option
ultimately to get the same result.

The XOR algorithm remains unchanged and remains commutative. The execution
routine of the XOR is noncommutative.

So, there are two operations here: XOR and executeXOR and OPs article is about
using the machine level noncommutativity in the form of different byte strings
for executeXOR. The machine still thinks that is is commutative for the end
result of the calculation/routine.

My comment about cypher XOR plaintext = cyphertext is just a general comment
about XOR.

~~~
woodruffw
I think you might have misunderstood the modr/m encoding: steg86 _never_
touches memory-register or register-memory operations, only register-register
ones. Nothing about this either requires or implies commutativity.

~~~
mikorym
Alright, but the principle still holds:

You have some action act(x,y). The encoding in the bitstream can either differ
for act(x,y) or act(y,x) or it is the same.

The original article states that the _effect_ of act(x,y) and act(y,x) is the
same. But the bitstream may differ. So it is commutative in the operations of
the computer, but not in the bitstream instruction.

