
How many x86 instructions are there? - ingve
https://fgiesen.wordpress.com/2016/08/25/how-many-x86-instructions-are-there/
======
thechao
I've written assemblers for about 30 different ISAs (ISAE?)---most of them
custom. An ISA (almost always) forms a DAG, where a node is a legal string of
{01X} that is producible from the ISA; the 'X' meaning "0 or 1". An encoding
is a daughter of another encoding if it has a 0 or 1 whenever the parent has a
0 or 1.

A "projection" is the imposition of a tree-like structure on the DAG, followed
by a cut of the tree to make it practicalable in size. Aliasing mnemonics in
the tree can cause confusion for the user---or aid in understanding.

These articles are noting that there are many ways to project the x86 ISA.

~~~
nibnib
do you mind going into why you have written so many assemblers?

~~~
thechao
My grad work was in system programming language design; I used to toy around
with VM design, so that meant I was (poorly) designing a lot of ISAs and then
writing a lot of assemblers for them. My professional work is unrelated to my
grad work, but has still let me expand on that interest. Sorry I'm so vague.
Think more along the lines of "python byte code assembler", or somesuch thing.

~~~
csl
Do you happen to blog? Would absolutely love to read about best practices,
curious situations and so on.

------
rzzzt
I intend to read the 3-volume edition of Intel's developer manual from start
to finish, but haven't decided when to do it, before or after TAOCP.

------
trentnelson
Neat, I'd come across Pin[1] recently doing some IDA Pro stuff... XED[2] looks
equally interesting.

[1]: [https://software.intel.com/en-us/articles/pin-a-dynamic-
bina...](https://software.intel.com/en-us/articles/pin-a-dynamic-binary-
instrumentation-tool)

[2]: [https://software.intel.com/en-us/articles/xed-x86-encoder-
de...](https://software.intel.com/en-us/articles/xed-x86-encoder-decoder-
software-library)

------
yesimahuman
One question this left me: are we leaving performance off the table by using
compilers that are only using a "standard" subset of the possible instructions
out there?

~~~
seabrookmx
Define "we". I think there is definitely some applications doing this.

Last I checked, using default GCC settings you can drastically change the
performance of the resulting app by passing different flags. This obviously
depends a lot on the app, but I recall a simple raytracing program I wrote in
C++ in University being sped up quite a bit when I passed it flags to optimize
for Pentium4+, because it leveraged certain SSE instructions.

MSVC++ with optimizations turned on actually bundles multiple copies of the
binary in the resulting .exe file, and is able to pick the optimal one (based
on the ISA extensions your CPU supported) at runtime. I'm sure you can do
something comparable with binaries produced by GCC and CLang.

Now.. do I think this improperly optimized binaries are a problem plaguing us?
No. Because the large native applications that would really benefit (such as
Chrome/V8) have already dealt with this. The smaller apps (think a small GTK
app like gedit) won't see a noticeable performance benefit. And most newer,
consumer applications (Spotify, Slack, Skype) are written in something like
Node anyways.

------
legulere
This article makes me kind of wonder wether there are processors that
understand two or more unrelated instruction encoding sets.

~~~
wolfgke
What do you consider as "unrelated"? For example modern ARMv8 cores understand
3 instruction sets: AArch32, Thumb2 and AArch64. AArch32 and Thumb may be
related, but AArch64 is more different (no conditional execution, more
registers etc. for example). Though you would perhaps still consider them as
related.

EDIT: Similarly, modern x86-64 cores support at least three instruction sets:
x86-16 (real mode), x86-32 (protected mode) and x86-64 (long mode). These are
related, but at least I claim that x86-16 and x86-64 are _very_ different in
practise.

A perhaps better example is Jazelle on older ARM cores, which supported
exectuting most of Java VM's instruction set directly on the processor.

Or another example: The instruction set of many processors is separated into
several parts that work and encode quite differently. For example the x87
instructions are stack-based instead of register-based. Or I have heard that
AltiVec instructions encode and work quite differently from normal PowerPC
instructions. The only reason why these are not considered as different
instruction sets is that they lie in the same opcode space. If you consider
instruction sets that "just lie in the same opcode space, but are otherwise
mostly unrelated" as unrelated, then there are some instruction sets that
evolved as hybrids of different instruction sets for controlling different
functional units.

Or another example is many SoCs contain additional coprocessors and ARM's
instruction set contains instruction for controlling them. These coprocessors
execute different instruction set. But perhaps you would consider this not as
"one processor" but rather several communicating processors on one chip.

TLDR: Define what you mean with "[one] processor" and with "unrelated" in
terms of encoding of instructions.

~~~
legulere
Your examples are all instruction sets that were developed with the idea in
mind that they run together with the other instruction sets. What I meant with
unrelated is instruction sets that have been developed independently and were
not meant to run on the same cpu from the beginning. For instance a cpu that
supports both the Risc-V and the Power architecture.

~~~
spc476
There was the V20/V30. It was a drop-in replacement for the 8088 that could
also execute 8080 (or Z80---it's been awhile since I used one) code (it was
also faster at executing 8088 code, which is why a lot of people used it).
While the 8088 is descended from the 8080, it's not binary compatible
(completely different opcodes) and probably fits your criteria.

~~~
wolfgke
It was the Intel 8080 not the Z80 for which the V20/V30 had an emulation mode
(just looked it up).

To do some nitpicking on this great example: The Intel 8086/8088 was designed
to be assembly source compatible (up to search & replace) to the Intel 8080
(source:
[https://en.wikipedia.org/w/index.php?title=Intel_8086&oldid=...](https://en.wikipedia.org/w/index.php?title=Intel_8086&oldid=735318399#The_first_x86_design)).
People like legulere would thus clearly nitpick that the criterion
"instruction sets that have been developed independently" is not satisfied
here.

------
brakmic
Sorry, if I may sound off-topic, but it seems that XED-downloads aren't
working at all.

------
creshal
> ISAs (ISAE?)

-ae only if the word is a) not an abbreviation, b) Latin, and c) a-declension.

~~~
sdegutis
Pretty sure GP was joking. But anyway, my favorite word to throw out there in
fun little pedantic conversations like this = octopus, which ends up !
actually being octopi like most people think, but rather octopodes due to it
being from Greek & ! Latin.

~~~
creshal
Grammar is always fun, even more so when you start mixing several languages.

~~~
PeCaN
Especially in the same word! _cough_ television _cough_

~~~
cm3
hexadecimal, was originally and correctly sexadecimal, but when it got wider
use somehow was turned into hexadecimal instead of the also correct
hexagecimal.

------
gsmethells
tldr; What is the answer? Curious.

~~~
jonknee
The post begins:

> It’s surprisingly hard to give a good answer (the question was raised in
> this article). It depends on how you count, and the details are interesting
> (to me anyway).

------
rev_null
It supports up to 4,294,967,296 instructions.

~~~
gruez
Where did you get that number from? Instructions are not limited to 31 bits in
length

