Basic SAT model of x86 instructions using Z3, autogenerated from Intel docs

userbinator · on May 15, 2023

find a bug in Intel's pseudocode

It's well-known that the official docs have bugs; others have had success using AMD's to compare:

https://www.os2museum.com/wp/sgdtsidt-fiction-and-reality/

hedora · on May 15, 2023

The X86-TSO paper is great if you want to learn how to write code for intel machines with more than one core:

https://pages.cs.wisc.edu/~markhill/restricted/cacm10_x86-TS...

It is relevant here because none of the documentation the authors found was correct, so they wrote this and confirmed it matched observed hardware behavior.

Now, I just need to find the equivalent for arm…

gnufx · on May 15, 2023

I don't remember what they cover, but Sewell's group also have models for ARM and POWER which must be referenced from https://www.cl.cam.ac.uk/~pes20/

For anyone who hasn't encountered one, Sewell gives excellent talks, e.g. at CCC, most recently https://media.ccc.de/v/35c3-9647-taming_the_chaos_can_we_bui...

chc4 · on May 15, 2023

This led to some fun times, like the infamous POP SS bug (CVE-2018-8897), which was a privilege escalation vulnerability in basically every kernel on Intel CPUs. But it didn't affect AMD CPUs, because they implemented the instruction as written in the x86 specification, which didn't have the bug that hardware did.

rep_lodsb · on May 15, 2023

That behaviour is actually documented though? It seems to be a corner case that OS developers didn't pay attention to until it was exploited.

It goes all the way back to 1980, where a revision to the 8086 introduced this to prevent an exception or non-maskable interrupt from using an incorrect stack pointer (by making MOV SS / POP SS atomic with the next instruction, which must load SP).

Arguably it should no longer be necessary in ring 3 protected mode, since the stack will be switched automatically - unless the handler itself was in ring 3, but no current OS allows that?

chc4 · on May 15, 2023

Woops, sorry, I was actually thinking of the SYSRET vulnerability (CVE-2012-0217) which was a mismatch between Intel and the written spec[0]. POPSS did also have differences between Intel and AMD implementations but both were vulnerable in different ways.

0: https://xenproject.org/2012/06/13/the-intel-sysret-privilege...

jcranmer · on May 15, 2023

If you want to be pedantic, this isn't from the official documentation (that's the bajillion-paged manual of every instruction), it's from the intrinsics guide here: https://www.intel.com/content/www/us/en/docs/intrinsics-guid...

Ideally, it should pull from the same master repository of information, but given my limited experience with how documentation tends to get built, I would be not at all surprised to learn that it's copy-pasted between several different projects, so that bugs in one don't get reflected in others.

CamperBob2 · on May 15, 2023

As an extremely experienced dev with a ton of Intel assembly experience, I'll confess I have no earthly idea what this is, what it does, who needs it, or what inspired it. Any good pointers to tutorials or introductory material?

peterfirefly · on May 15, 2023

It's all about model checking:

https://en.wikipedia.org/wiki/Model_checking

Three of the pioneers of model checking won the Turing Award back in the noughties:

https://amturing.acm.org/award_winners/clarke_1167964.cfm

https://amturing.acm.org/award_winners/emerson_1671460.cfm

https://amturing.acm.org/award_winners/sifakis_1701095.cfm

Z3 is an amazing theorem prover, bordering on magic.

https://en.wikipedia.org/wiki/Z3_Theorem_Prover

Theorem proving is often used in model checking.

This guy's code produces Z3 theorems for a subset of the x86 instructions. By theorems, I mean "mathematical statements about how they work".

They can be combined so that a small(ish?) piece of x86 code can be turned into a model, which Z3 can prove (or disprove) statements about.

This is very useful for proving a piece of code right or wrong.

Model checking used to require a phd -- now it just requires a bit of effort and "mathematical maturity". We have come a long way towards making it a generally available tool for all/most programmers but there is still a long way to go.

weinzierl · on May 15, 2023

Could this be used to show that two programs (with only instructions from the covered subset) are equivalent?

boomanaiden154 · on May 15, 2023

You can use it to (mostly) validate small snippets are the same. See Alive2 for the application of Z3/formalization of programs as SMT for that [1]. As far as I'm aware there are some problems scaling up to arbitrarily sized programs due to a lack of formalization in higher level languages in addition to computational constraints. With a lot of time and effort it can be done though [2].

1. https://github.com/AliveToolkit/alive2

2. https://sel4.systems/

insulanus · on May 15, 2023

This type of thing can help you formally verify code.

So, if your proof is correct, and your description of the (language/CPU) is correct, you can prove the code does what you think it does.

Formal proof systems are still growing up, though, and they are still pretty hard to use. See Coq for an introduction: https://coq.inria.fr/

aqrit · on May 15, 2023

This was for a super optimizer.

mdaniel · on May 15, 2023

This is unlicensed, as best I can tell, as is the parsing submodule (https://github.com/zwegner/sprdpl)

mattst88 · on May 15, 2023

This is very interesting. I'm going to be excited to see what neat things can be done with it.

garganzol · on May 15, 2023

Count me in. Any kind of formalization is empowering, but what practical applications can it have? I'm sure there are quite few of them, but what are they?

bcrl · on May 17, 2023

I wonder if this would be considered a violation of Intel's copyright on the manuals?

bjourne · on May 15, 2023

What is the engine satisfying?

fooker · on May 15, 2023

It is common to express denotational semantics as SAT/SMT expressions.

For example, consider a function f(x) = x << 1.

A valid sat semantics for this function is: y == f(x) implies y == 2 * x.

bjourne · on May 15, 2023

But what variable assignment is being SATisfied?

fooker · on May 15, 2023

if f(x) == g(x) is satisfiable, you get one value of x for which f(x) == g(x)

if f(x) != g(x) is satisfiable, you get one value of x for which f(x) != g(x)

Both of these are useful in practice.

programmer_dude · on May 16, 2023

This does not answer the question. I think people here know how equations work. But what's the "x" in the article?

fooker · on May 16, 2023

Think of cpu instructions as functions.

x is the set of inputs to an instruction.

bjourne · on May 17, 2023

I see. The goal is to find some x s.t f(x)!=g(x).