It can be useful to test code that's supposed to be portable on a system with a ...

saagarjha · on Jan 11, 2020

> And a self-synchronizing instruction stream can be a security advantage and a fixed instruction width means Power and AArch64 are trivially self-synchronizing.

AArch64 is aligned, but not self-synchronizing. While it would not be possible to execute, reading code at an arbitrary offset can still create be valid code. I'd imagine Power has the same issue, since it's really hard to do this generally if you'd like to have a sane encoding and support immediate.

Somewhat interestingly, x86 is variable length but I have heard that it is often "eventually self-synchronizing": apparently if you start it off at the wrong offset, it will decode a couple of instructions incorrectly but usually end up disassembling to the correct ones.

Symmetry · on Jan 11, 2020

Generally fixed width instruction sets require that instructions be aligned, that is instruction addresses end with two 0s if they're 32 bits. One benefit of this is that you can make a jump 4 times as long for a given constant size. In the case of AArch64 this lets them do 128MB branches with a 26 bit signed constant or 1 MB conditional branches with a 19 bit constant.

Another benefit is that the fetch stage doesn't have to handle corner cases like the instruction crossing cache line boundaries. I don't think the security implications were anything the designers cared about but they're a third benefit. Oh, and I think some language designers have stored garbage collection related information in the least significant bits of stored addresses since it doesn't affect flow control but I wouldn't swear to that.

You're right that a natural x86 instruction stream will tend to synchronize itself fairly quickly. The problem is a malicious instruction stream that can be designed not to do that for at least long enough to do its thing.

AnimalMuppet · on Jan 11, 2020

ELI5: How, specifically, does AArch64 have a "weaker than x86" memory model?

asveikau · on Jan 11, 2020

When multiple cores access the same memory location, it is expensive for one core to invalidate the cache of another or ensure operations don't get re-ordered in either core. These days most architectures when reading or writing shared data require memory barrier instructions to guarantee your core to sees writes from another, or other cores to see your writes, in a timely sequential fashion like we expect when writing code that accesses variables.

Historically there were architectures that would reorder these accesses in a "lax" way, making very few guarantees about what you will see from another core, on the theory that it will cost less to keep things synchronized between cores (most data is not shared anyway, so why waste work trying to create a unified view across cores? The CPU can also reorder work for better efficiency.). Intel is historically one of the most conservative, strict-ordering architectures, requiring fewer barrier instructions and creating the illusion that reads and writes more or less occur on a single timeline.