
A survey of attacks against Intel x86 over last 10 years (2015) [pdf] - DoctorBit
https://blog.invisiblethings.org/papers/2015/x86_harmful.pdf
======
jhoechtl
Words are not enough to express my deep and profound disgust of all those
morons abusing "... considered harmful". It's somewhat ok, if it was brought
up years ago and counts as legacy. But for any new abuse all I find are the
words of Honeybunny [https://genius.com/Tim-roth-pumpkin-and-honey-bunny-
annotate...](https://genius.com/Tim-roth-pumpkin-and-honey-bunny-annotated)

------
snvzz
x86 has had a good run. But it's time for it to go.

I'm hopeful for RISC-V.

~~~
zaarn
I'm putting my bets on Mill winning the race eventually.

Not only Mill, Microsoft has been working on a VLIW too (yes I know, not
exactly VLIW, it's close enough).

IMO these ISA's are the future, compilers and programming languages are
nowadays smart enough to be able to figure out how to handle VLIW compilation
(plus having learned from Itanium's failures).

~~~
AstralStorm
No they really cannot. Many of the optimizations CPU do on the fly are akin to
JIT recompilers. (in microcode and schedule side) These cannot be effectively
done ahead of time yet, at least not without an instruction accurate
profiling.

Not to mention VLIW wastes CPU instruction cache for instructions that aren't
ran.

It is no accident that CPUs and compilers gravitated towards RISC.

~~~
zaarn
Well, no, point of VLIW is that instead of doing it like a JIT recompiler you
do it like a slow recompiler. This is almost always possible and can be done
ahead of time. Proof: The CPU itself does it under time constraint, a compiler
should be capable of the same minus time constraint.

VLIW also doesn't really waste instruction cache if your compiler is being
smart and aligns branches to an instruction word, though you still blow the
pipeline on a branch if you miss but atleast in Microsofts case they include a
way for the compiler to include a prediction which it is arguably in a better
position to make. This goes double if you use profiling-guided optimization.
If the claims of the Mill guys are true then even the "wasted CPU instruction
cache" doesn't hurt performance.

CPUs and compilers are gravitating towards lots of things. x86 and ARM aren't
the only instruction set. VLIW is healthy and very alive on a lot of DSPs.
There are Russian CPUs that use VLIWs in active use. AMD GPUs used VLIW for a
while (and some variants still do). You can even get VLIW-based
Microcontrollers for cheap.

IMO compilers and CPUs may gravitate towards RISC in the shortterm as it is
more similar to CISC in terms of complexity. VLIW needs compilers to be smart
and languages to be smart too for optimal use. Rust for example would be
capable of really taking advantage of VLIW but LLVM doesn't support that
complexity (yet, though there is some work).

In the long term, so my prediction, VLIW will dominate by nature of being
simpler, faster and more efficient.

~~~
PDoyle
Your proof is flawed. The CPU has access to the complete current program
state, and also complete knowledge of its own hardware. A static compiler has
neither. Therefore, it's not at all clear that a compiler can do whatever the
CPU can do.

Example: the best order to run a sequence of instructions could depend on
which inputs happen to be in the L1 cache at the time. This could differ from
one execution to the next. There's no way for a static compiler to get this
right.

~~~
zaarn
You don't need access to the full current program state, most of OOE can be
done with simple graph coloring and the knowledge of the number of registers.
A static compiler will know the number of registers available as well as
several other hardware intrinsic features (after all it has to use them).

On a VLIW a lot more features would be necessarily exposed and the compiler
will have to take advantage of them.

Your example can be optimized by a compiler trivially by optimizing for cache-
locality, something compilers already do. It simply means that if you access
memory address X and your code accesses this code elsewhere, the compiler will
try to keep those two closer together.

Making a simple prediction about cache contents is trivial for compilers and
as mentioned, already happens. You can use graphs to build up dependencies on
memory and then reduce the distance between connected node points in the
execution path. Since this is VLIW and we might be able to tell the CPU which
branch is likely we can even not do this in favor of optimizing the happy path
better.

A modern optimizer is a very complex beast, it can certainly know some things
about the state of the program during runtime and it will make some
assumptions about it (enable -O3 if you want to test). Most certainly it is
able to optimize your example in atleast a minimal fashion on more aggressive
settings.

The CPU pipeline to my knowledge does not optimize by L1 cache as checking
contents of the L1 cache is still rather expensive and the lookahead in the
command queue is usually limited to a few hundred instructions. Hitting L1 is
still a magnitude slower than hitting a register and very expensive to do for
every memory access instruction. The pipeline tends to favor using branch
predictors and register dependencies, which is simpler and faster, as well as
some historical data about previous code run.

------
okket
Previous discussion from 3 years ago:
[https://news.ycombinator.com/item?id=10458318](https://news.ycombinator.com/item?id=10458318)
(169 comments)

------
jstewartmobile
An oldie, but a goodie.

So much quality software out there that is either open source, or written in a
VM language, or both that it boggles the mind as to why we are still munching
on this particular shit sandwich.

But what do I know?

------
vages
Be courteous and add a (2015) to the title.

------
lioeters
[PDF] (2015) - Probably better without the URL hash.

Still relevant..

------
stakhanov
...if I have to endure one more abuse of the "considered harmful" idiom, I'm
going to puke.

~~~
kuroguro
Puking is considered harmful.

