Hacker News new | past | comments | ask | show | jobs | submit login
A survey of attacks against Intel x86 over last 10 years (2015) [pdf] (invisiblethings.org)
86 points by DoctorBit 9 months ago | hide | past | web | favorite | 40 comments

Words are not enough to express my deep and profound disgust of all those morons abusing "... considered harmful". It's somewhat ok, if it was brought up years ago and counts as legacy. But for any new abuse all I find are the words of Honeybunny https://genius.com/Tim-roth-pumpkin-and-honey-bunny-annotate...

x86 has had a good run. But it's time for it to go.

I'm hopeful for RISC-V.

I'm putting my bets on Mill winning the race eventually.

Not only Mill, Microsoft has been working on a VLIW too (yes I know, not exactly VLIW, it's close enough).

IMO these ISA's are the future, compilers and programming languages are nowadays smart enough to be able to figure out how to handle VLIW compilation (plus having learned from Itanium's failures).

No they really cannot. Many of the optimizations CPU do on the fly are akin to JIT recompilers. (in microcode and schedule side) These cannot be effectively done ahead of time yet, at least not without an instruction accurate profiling.

Not to mention VLIW wastes CPU instruction cache for instructions that aren't ran.

It is no accident that CPUs and compilers gravitated towards RISC.

Well, no, point of VLIW is that instead of doing it like a JIT recompiler you do it like a slow recompiler. This is almost always possible and can be done ahead of time. Proof: The CPU itself does it under time constraint, a compiler should be capable of the same minus time constraint.

VLIW also doesn't really waste instruction cache if your compiler is being smart and aligns branches to an instruction word, though you still blow the pipeline on a branch if you miss but atleast in Microsofts case they include a way for the compiler to include a prediction which it is arguably in a better position to make. This goes double if you use profiling-guided optimization. If the claims of the Mill guys are true then even the "wasted CPU instruction cache" doesn't hurt performance.

CPUs and compilers are gravitating towards lots of things. x86 and ARM aren't the only instruction set. VLIW is healthy and very alive on a lot of DSPs. There are Russian CPUs that use VLIWs in active use. AMD GPUs used VLIW for a while (and some variants still do). You can even get VLIW-based Microcontrollers for cheap.

IMO compilers and CPUs may gravitate towards RISC in the shortterm as it is more similar to CISC in terms of complexity. VLIW needs compilers to be smart and languages to be smart too for optimal use. Rust for example would be capable of really taking advantage of VLIW but LLVM doesn't support that complexity (yet, though there is some work).

In the long term, so my prediction, VLIW will dominate by nature of being simpler, faster and more efficient.

Your proof is flawed. The CPU has access to the complete current program state, and also complete knowledge of its own hardware. A static compiler has neither. Therefore, it's not at all clear that a compiler can do whatever the CPU can do.

Example: the best order to run a sequence of instructions could depend on which inputs happen to be in the L1 cache at the time. This could differ from one execution to the next. There's no way for a static compiler to get this right.

You don't need access to the full current program state, most of OOE can be done with simple graph coloring and the knowledge of the number of registers. A static compiler will know the number of registers available as well as several other hardware intrinsic features (after all it has to use them).

On a VLIW a lot more features would be necessarily exposed and the compiler will have to take advantage of them.

Your example can be optimized by a compiler trivially by optimizing for cache-locality, something compilers already do. It simply means that if you access memory address X and your code accesses this code elsewhere, the compiler will try to keep those two closer together.

Making a simple prediction about cache contents is trivial for compilers and as mentioned, already happens. You can use graphs to build up dependencies on memory and then reduce the distance between connected node points in the execution path. Since this is VLIW and we might be able to tell the CPU which branch is likely we can even not do this in favor of optimizing the happy path better.

A modern optimizer is a very complex beast, it can certainly know some things about the state of the program during runtime and it will make some assumptions about it (enable -O3 if you want to test). Most certainly it is able to optimize your example in atleast a minimal fashion on more aggressive settings.

The CPU pipeline to my knowledge does not optimize by L1 cache as checking contents of the L1 cache is still rather expensive and the lookahead in the command queue is usually limited to a few hundred instructions. Hitting L1 is still a magnitude slower than hitting a register and very expensive to do for every memory access instruction. The pipeline tends to favor using branch predictors and register dependencies, which is simpler and faster, as well as some historical data about previous code run.

> Not to mention VLIW wastes CPU instruction cache for instructions that aren't ran.

Nah. You can have variable length VLIW instructions.

And you can design an out of order VLIW that still has significant advantages at decoding many operations at once.

>Microsoft has been working on a VLIW too (yes I know, not exactly VLIW, it's close enough).

VLIW isn't going anywhere. Generating code for VLIW isn't getting any easier anytime soon, and the complexity doesn't help the formal proofing that's become a must thanks to data protection requirements of the current world.

If we've learned anything in the past decades, it'd be that RISC is the only valid approach going forward.

There is a lot of research in VLIW even in the modern days and there is a lot of VLIW hardware out there (DSPs, Russian Hardware, Microcontrollers, etc.)

RISC is not the one-true-way and I don't think there is evidence for that. x86 is at this point an overburdened platform so any alternative that is fresh, so to speak, is viable.

RISC is maybe the way forward in the short term as it is more similar to CISC.

For general purpose code (that is, not DSP's and the like), Intel, HP, and whoever else they managed to suck into their vortex of doom, made a zillion $$$ bet on VLIW, and ultimately it all belly-flopped. Not just some academic paper, but working high end silicon on a leading edge process, big investments in compilers, and whatnot.

Why would it be different the next time?

Because experience from past mistakes.

Mill are doing some insane IPC improvements and also hammer out a lot of the security misfortunes that x86 has (such as the ability to perform a stack overflow into a return address or the ring security levels 0 and 3 that are so popular).

The EPIC Cpu that Microsoft worked on had Windows 10 and Linux running in the end. Including LLVM, C/C++ runtime, Busybox, FreeRTOS, .NET and core parts of Visual C++ 2017.

One of the core reasons that Itanium sucked ass was IMO that the architecture was too different and compilers as well as programming languages weren't taking advantage of it well enough to make the difference.

The network effect of "run existing blobs" is strong enough that I consider any change from ARM's RISC on the mobile space in the next 10 years unlikely. On the desktop the situation is similar but IIRC MS has been tinkering on a fairly good x86 JIT emulator for Windows which was able to run MS Office fairly decently, that might remove such network effects.

Mill is awesome. I'm afraid it has too many revolutionary ideas in one package, though, which could impede its widespread adoption.

RISC-V inherited all the bad ideas of x86 by now. It has an SMM equivalent, a separate hypervisor mode (instead of full orthogonality which lets you get away without one), resident firmware code outside the OS' control, ...

> separate hypervisor mode

RISC-V, without the hypervisor extension, meets the requirements for Popek-Goldberg virtualization, unlike x86. The problem is that classic virtualization requires shadow paging for memory virtualization, which can be slow. The hypervisor extension (not yet finalized) only adds two level nested paging, a few shadow CSRs, and IIRC a few more interrupt handling registers. The hypervisor extension itself is virtualizable (again, with shadow page tables).

> a few shadow CSRs

Right, I forgot about RISC-V's MSR equivalent - with a worse assembler implementation (though that's simple to fix without touching the ISA) since the standard assembly expects them to be names, not numbers (that can be #defined away to names).

I had a (very emphatically not) fun time updating coreboot's toolchain and code base in lock step to ensure that risc-v code remained compilable when these changed.

Is there a different ISA that you would have preferred to see as the basis for an open hardware industry?

Please don't spread misinformation. RISC-V has a well-thought out machine layer which is open source in all existing implementations[1]. You can easily modify the M layer if it's interfering with the performance or security of your OS.

The virtualization extension hasn't been finalized (but they are working with key KVM people), but it will perform a lot better than a theoretically pure self-virtualizable ISA.

[1] https://github.com/riscv/riscv-pk

"please don't spread misinformation" followed by confirming that all the mis-features I mentioned exist? (just that you disagree about them being mis-features)

How can I run Linux on RISC-V portably without having to implement the SBI?

You said:

> It has an SMM equivalent,

It really doesn't. It has an open source layer under supervisor mode which works with the OS. This is very different from Intel SMM, a closed source blob that acts against the interests of the OS and the end user.

> a separate hypervisor mode (instead of full orthogonality which lets you get away without one),

As others pointed out also, this is plain wrong.

> resident firmware code outside the OS' control, ...

As explained, it has an open source machine mode, and we regularly modify both OS and machine mode to work together.

> How can I run Linux on RISC-V portably without having to implement the SBI?

Why would you want to? Porting Linux to M mode is going to be tricky because there's no paging, but other OSes could run there.

> This is very different from Intel SMM, a closed source blob that acts against the interests of the OS and the end user.

I'm not talking about implementations (there are open source SMM implementations that act against no-one's interest), but about the mechanism. The mere presence is enough that you'll see a closed source blob on RISC-V soon enough, and it won't be friendly.

The only way out is not to have that mechanism in the first place but by making it part of the architecture (instead of some vendor addition because they couldn't get the chip to work in any other way), it's guaranteed to stay.

> As others pointed out also, [getting away without hypervisor mode] is plain wrong.

Only if you want to deprive usermode from secondary page table capabilities. Instead of making it a separate mode, make it an optimized instruction and even "virtual hypervisor" would be relatively efficient (two invocations of that instruction, instead of one invocation plus a software implementation).

> As explained, it has an open source machine mode

And as explained, it doesn't matter that the early hackers have some open source code. If RISC-V ever takes off (and given how invested you seem to be in it, you probably want that?), it won't stay that way.

SiFive only had to give in this time because the market isn't mature yet. That will change.

>> without having to implement the SBI?

> Why would you want to?

Because I don't want anything to happen outside the OS' control. Why can't M mode be left for dead on systems that mostly run in supervisor/user mode? Jump in supervisor, returning into M mode incurs a cold reset or trap (ie. Don't Do It). No SBI.

One of those decisions that will come to haunt RISC-V.

Companies using any ISA can add secret or closed extensions. That is simply outside the scope of an ISA specification.

If you want to ensure there is no secret stuff in your chip you'd better be prepared to create your own designs and manufacture them in your own foundry. RISC-V would still be an excellent starting point.

Extensions are fair game, of course. As is not buying chips with closed extensions.

But designing the hooks for anti-user binary blobs into the ISA just encourages misbehavior as soon as they can get away with it.

RISC-V code in privileged and protected pockets of the systems is so much easier to design, maintain and update than ISA extensions that I consider that the bigger problem due to its future ubiquity.

The machine mode is there for a very good reasons of microarchitecture design scalability, allowing the same machine model to be presented to the operating system with everything from tiny designs to everything-in-hardware. In all existing designs it is open source. It's quite reasonable to have this explicit layer, since in many other cases it has been hidden away (think: Transmeta, or many x86 emulators).

Why RISC-V and not e.g. Arm? Why is RISC-V better than x86 from AMD? Serious questions, I have no idea about these topics and would love to learn more.

RISC-V is simpler, and easier to implement. Instruction density is very good with the compressed extension, and is applicable to many niches, from in-order low power embedded cores, to high performance out of order desktop CPUs. We can expect marginal improvements over x86 and ARM across the board.

On the other hand, it is not finished. A number of extensions have yet to be frozen, and we're just beginning to see commercial processors (I mean, where the ISA is exposed to the end user. NVIDIA already uses RISC-V internally).

I believe you have to pay royalties for Arm while RISC-V is completely open source.

The royalties aren't generally a big deal, being something like 2%. It's other issues, like complexity or startup fees or flat out being unable to get a license.

This. The world would be different, and probably better, if Samsung, Apple and other third parties could make their own x86 processors.

What's the situation regarding this for Mill/VLIW?

VLIW in its pure form is old enough that anybody should be legally able to build an ISA based on that concept.

The Mill folks are heavily into patents for their tech: https://millcomputing.com/patents/. So besides Mill's general feasibility it depends on how they work those patents.

Nice, I hadn't looked at RISC-V before.

The operation of FENCE.I scares me a little:

> FENCE.I does not ensure that other RISC-V harts’ instruction fetches will observe the local hart’s stores in a multiprocessor system. To make a store to instruction memory visible to all RISC-V harts, the writing hart has to execute a data FENCE before requesting that all remote RISC-V harts execute a FENCE.I

Yikes. That sounds cumbersome for multithreaded code patching systems, like modern JIT compilers. (A "hart" here is a hardware thread.) Sounds like all threads must poll periodically to check whether they should run a FENCE.I, and when they do so, they report that they've done it. Doesn't sound like a lot of fun to implement, though maybe better in software than hardware?

I don't know enough to say if this is accurate or not. However there are working groups reviewing the memory model[1] (also implementing fast ISRs[2]) so if there are performance problems in this area then they're being looked at.

[1] https://content.riscv.org/wp-content/uploads/2018/05/14.25-1... https://content.riscv.org/wp-content/uploads/2018/05/10.40-1...

[2] https://content.riscv.org/wp-content/uploads/2018/05/08.45-0...

Previous discussion from 3 years ago: https://news.ycombinator.com/item?id=10458318 (169 comments)

An oldie, but a goodie.

So much quality software out there that is either open source, or written in a VM language, or both that it boggles the mind as to why we are still munching on this particular shit sandwich.

But what do I know?

Be courteous and add a (2015) to the title.

[PDF] (2015) - Probably better without the URL hash.

Still relevant..

...if I have to endure one more abuse of the "considered harmful" idiom, I'm going to puke.

Puking is considered harmful.

I feel that - normally it's hyperbole - but a full Minix OS with an IP stack we don't have control over on all our servers is actually pretty harmful.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact