Hacker News new | past | comments | ask | show | jobs | submit login
Basic SAT model of x86 instructions using Z3, autogenerated from Intel docs (github.com/zwegner)
153 points by djoldman on May 15, 2023 | hide | past | favorite | 24 comments



find a bug in Intel's pseudocode

It's well-known that the official docs have bugs; others have had success using AMD's to compare:

https://www.os2museum.com/wp/sgdtsidt-fiction-and-reality/


The X86-TSO paper is great if you want to learn how to write code for intel machines with more than one core:

https://pages.cs.wisc.edu/~markhill/restricted/cacm10_x86-TS...

It is relevant here because none of the documentation the authors found was correct, so they wrote this and confirmed it matched observed hardware behavior.

Now, I just need to find the equivalent for arm…


I don't remember what they cover, but Sewell's group also have models for ARM and POWER which must be referenced from https://www.cl.cam.ac.uk/~pes20/

For anyone who hasn't encountered one, Sewell gives excellent talks, e.g. at CCC, most recently https://media.ccc.de/v/35c3-9647-taming_the_chaos_can_we_bui...


This led to some fun times, like the infamous POP SS bug (CVE-2018-8897), which was a privilege escalation vulnerability in basically every kernel on Intel CPUs. But it didn't affect AMD CPUs, because they implemented the instruction as written in the x86 specification, which didn't have the bug that hardware did.


That behaviour is actually documented though? It seems to be a corner case that OS developers didn't pay attention to until it was exploited.

It goes all the way back to 1980, where a revision to the 8086 introduced this to prevent an exception or non-maskable interrupt from using an incorrect stack pointer (by making MOV SS / POP SS atomic with the next instruction, which must load SP).

Arguably it should no longer be necessary in ring 3 protected mode, since the stack will be switched automatically - unless the handler itself was in ring 3, but no current OS allows that?


Woops, sorry, I was actually thinking of the SYSRET vulnerability (CVE-2012-0217) which was a mismatch between Intel and the written spec[0]. POPSS did also have differences between Intel and AMD implementations but both were vulnerable in different ways.

0: https://xenproject.org/2012/06/13/the-intel-sysret-privilege...


If you want to be pedantic, this isn't from the official documentation (that's the bajillion-paged manual of every instruction), it's from the intrinsics guide here: https://www.intel.com/content/www/us/en/docs/intrinsics-guid...

Ideally, it should pull from the same master repository of information, but given my limited experience with how documentation tends to get built, I would be not at all surprised to learn that it's copy-pasted between several different projects, so that bugs in one don't get reflected in others.


As an extremely experienced dev with a ton of Intel assembly experience, I'll confess I have no earthly idea what this is, what it does, who needs it, or what inspired it. Any good pointers to tutorials or introductory material?


It's all about model checking:

https://en.wikipedia.org/wiki/Model_checking

Three of the pioneers of model checking won the Turing Award back in the noughties:

https://amturing.acm.org/award_winners/clarke_1167964.cfm

https://amturing.acm.org/award_winners/emerson_1671460.cfm

https://amturing.acm.org/award_winners/sifakis_1701095.cfm

Z3 is an amazing theorem prover, bordering on magic.

https://en.wikipedia.org/wiki/Z3_Theorem_Prover

Theorem proving is often used in model checking.

This guy's code produces Z3 theorems for a subset of the x86 instructions. By theorems, I mean "mathematical statements about how they work".

They can be combined so that a small(ish?) piece of x86 code can be turned into a model, which Z3 can prove (or disprove) statements about.

This is very useful for proving a piece of code right or wrong.

Model checking used to require a phd -- now it just requires a bit of effort and "mathematical maturity". We have come a long way towards making it a generally available tool for all/most programmers but there is still a long way to go.


Could this be used to show that two programs (with only instructions from the covered subset) are equivalent?


You can use it to (mostly) validate small snippets are the same. See Alive2 for the application of Z3/formalization of programs as SMT for that [1]. As far as I'm aware there are some problems scaling up to arbitrarily sized programs due to a lack of formalization in higher level languages in addition to computational constraints. With a lot of time and effort it can be done though [2].

1. https://github.com/AliveToolkit/alive2

2. https://sel4.systems/


This type of thing can help you formally verify code.

So, if your proof is correct, and your description of the (language/CPU) is correct, you can prove the code does what you think it does.

Formal proof systems are still growing up, though, and they are still pretty hard to use. See Coq for an introduction: https://coq.inria.fr/


This was for a super optimizer.


This is unlicensed, as best I can tell, as is the parsing submodule (https://github.com/zwegner/sprdpl)


This is very interesting. I'm going to be excited to see what neat things can be done with it.


Count me in. Any kind of formalization is empowering, but what practical applications can it have? I'm sure there are quite few of them, but what are they?


I wonder if this would be considered a violation of Intel's copyright on the manuals?


What is the engine satisfying?


It is common to express denotational semantics as SAT/SMT expressions.

For example, consider a function f(x) = x << 1.

A valid sat semantics for this function is: y == f(x) implies y == 2 * x.


But what variable assignment is being SATisfied?


if f(x) == g(x) is satisfiable, you get one value of x for which f(x) == g(x)

if f(x) != g(x) is satisfiable, you get one value of x for which f(x) != g(x)

Both of these are useful in practice.


This does not answer the question. I think people here know how equations work. But what's the "x" in the article?


Think of cpu instructions as functions.

x is the set of inputs to an instruction.


I see. The goal is to find some x s.t f(x)!=g(x).




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: