We could get a >50% performance boost by ignoring security.
Think about that. 1.5x - 2x boost, on a single core. That's like doubling a CPU's clock speed, except more powerful since it involves caching memory (which is usually the bottleneck, not raw horsepower).
What would the TempleOS of CPUs look like, I wonder?
We ought to have the option of running workloads on such CPUs. Yes, they're completely compromised, but if you're running a game server you really don't have to care very much. Most data is ephemeral, and the data that isn't, isn't usually an advantage to know.
Being able to run twice the workload as your competitors on a single core is a big advantage. You saw what happened when performance suddenly dropped thanks to the patch.
Thus they decided to do permissions checking in parallel with other operations, leading to Spectre. It's interesting to note that the original intent of protected mode was not as a real secure "security feature" immune from all attacks, but more to provide some isolation to guard against accidents and expand the available address space. The entire Win9x series of OSs embody this principle.
x86 running in unreal mode might be close. I wonder if anyone has done any benchmarking of recent x86 CPUs in that mode...
No. Leading to Meltdown. Spectre is fundamental to the way we approached any speculative execution until now.
it would be a lot of machinery
you could also do latency hiding with smt instead of trying to fight it head-on with speculation. ultimately probably more productive, but either the compiler or the programming model or the user has to expose that concurrency.
I remember a 10 years old Microsoft Research project that implemented an OS that would use the .NET managed runtime to implement security. IIRC, they had interesting differences with CPU memory isolation off.
I like the idea that you don't need hardware barriers to isolate programs when they are lobotomized.
Singularity would be just as vulnerable to the recent bugs as contemporary OSes are, possibly more so because there is even less timing uncertainty when crossing privilege domains, making the attacks even easier
Intel tried making a 386 variant with only protected mode a long time ago, and it didn't sell very well: https://en.wikipedia.org/wiki/Intel_80376
Not when the bootloader is an UEFI application. UEFI puts the processor into 32 bit protected mode or typically long mode (64 bit). So when your bootloader that is implemented as an UEFI application is started, you are already out of real mode.
What makes you believe that? The (crude, emergency, not designed-in) workaround dropping performance by 50% doesn't mean that fixing the issue in the design requires reducing performance by 50%.
Naturally, protecting an indirect branch means that no
prediction can occur. This is intentional, as we are
“isolating” the prediction above to prevent its abuse.
Microbenchmarking on Intel x86 architectures shows that our
converted sequences are within cycles of an native indirect
branch (with branch prediction hardware explicitly
For optimizing the performance of high-performance binaries,
a common existing technique is providing manual direct
branch hints. I.e., Comparing an indirect target with its
known likely target and instead using a direct branch when a
match is found.
Example of an indirect jump with manual prediction
cmp %r11, known_indirect_target
One example of an existing implementation for this type of
transform is profile guided optimization, which uses run
time information to emit equivalent direct branch hints.
And as I interpret this, the reference to FDO is unfortunately confusing and perhaps better placed under the heading of correctness. They're merely saying that the transformation of indirect branches to direct branches is something compilers already do to steer speculative execution down the correct path. Except in the case of a retpoline it's steering speculative execution into a trap.
 I'm no expert in this area (nor do I do much assembly programming) but AFAIU branch predictors these days have substantial caches of their own. Disabling branch prediction or flushing the predictor's caches is probably much more costly than simply steering the predictor into a trap. The single pipeline bubble you create is better than the many bubbles that would be created if the branch predictor had to warm up again for later code.
Beyond game servers, this could be the mode of operation of e.g. compute cluster nodes, relying on external firewall for security, and running zero untrusted code. Of course they won't care about meltdown or spectre either.
I don't think this requires a 1000x slowdown but I'm not super familiar with this area of the spec. It should be possible to cache the tainted state of the canvas in a single bit, only updating it when adding new images to the canvas, and checking the bit when converting the canvas to an array.
How? There's no concept of a tainted WebGL canvas because it just can't load textures without CORS.
Spectre is a much more complex issue. It is more of a "I would never thing of that" thing. Looks like there were people at the security community talking about it even 10 years ago, but it is certainly hard for knowledge to spread to chip designers - most of what security people study would be noise for them, and even the security people mostly ignore those.
Do you have a different source for that?
To see why that toy example is insufficient, consider that you could simply execute the load directly without putting an exception in front of it and you would be able to read the value.
It's just another way to say you're building technical debt into your chip/software if you don't consider security from day one. Because sooner or later you will need that security, if you have tens of millions or more people using that product. So it's not like you can just ignore it. You're just postponing it for later.
> One service pr. VM means no need for virtual address spaces, and no overhead due to address translation. Everything happens in a single, ring 0 address space.
Think also about scientific computing, which usually doesn't need to be done in a secure multi-user environment (strictly speaking). I.e. you could limit security to the entrance gate (log in).
This was being explored with Microsoft's Singularity and Midori projects but seems to be a dormant idea.
In the old days, probably Burroughs B5000 for it being high-level with security and programming in large support built in before those were a thing. If about hackability, the LISP machines like Genera OS that you still can't quite imitate with modern stuff. If a correct CPU and you want it today, then I've been recommending people build on VAMP that was formally-verified for correctness in Verisoft project. Just make it multicore, 64-bit, and with microcode. Once that's done, use the AAMP7G techniques for verifying firmware and priveleged assembly on top of it. Additionally, microcode will make it easier to turn it into a higher-level CPU for cathedral-style CPU+OS combos such as Oberon, a LISP/Scheme machine, or Smalltalk machines. Heck, maybe even native support for Haskell runtime for House OS.
Perhaps by making it unnecessary, rather than ignoring it - SPIN is now over 20 years old:
There are other single-address-space safe-language systems that are even older; Genera is from the early '80s.
They're used in bitcoin mining where you do the same operation over and over.
If it were all signed, then is the risk of the box becoming compromised fairly small and worst case, replacing the entire box (rebuild) if there were an issue could address problems quickly?
[edit to make it clear that this is a question]
In original x86 protected mode model (ie. real security flags on descriptor level and page protection bits only used for scheduling of paging) this would be non-issue.
(SC applications tend to be huge fans of using enormous page sizes, like 2 GB, which mitigates page table walk costs)
* Value for money
* memory performance on inter core access
* more memory channels per core
* still within x86 ISA negating the need for any Se rewrites.
Our workloads are memory access bound. So the above points hit home.
We're going to try AMD servers for the first time at this research group. If they do hold the promise, intel finally got some active competition in our realm!
Multicore AVX512 lowers the frequency
The Qualcomm CPU only has the 128-bit wide NEON SIMD, while Broadwell has 256-bit wide AVX2, and Skylake has 512-bit wide AVX-512. This explains the huge lead Skylake has over both in single core performance. In the all-cores benchmark the Skylake lead lessens, because it has to lower the clock speed when executing AVX-512 workloads. When executing AVX-512 on all cores, the base frequency goes down to just 1.4GHz---keep that in mind if you are mixing AVX-512 and other code.
I mean, we know they are all saying "arm" and "amd" both as negotiating tactic as well as strategically diversifying microarchitectures. That said, I'm not sure it's like amd can deliver more instructions per second per dollar.
I wonder if it would make sense for some loads to prepare for Itanium usage, or even older Atom architectures? Does any one deploy Itanium in the cloud?
I am sure you didn’t mean it this way, but it struck me as funny saying AMD supports x86-64. Of course AMD supports x86-64, they invented it. That is why Microsoft and Linux refer to that architecture as AMD64.
Also, I don't believe the single ARM CPU that's vulnerable to meltdown (Cortex-A75) has actually been included in any devices at this point, so for now it is safe to assume any ARM-based device you have is not vulnerable to meltdown.
The large providers are always looking around, it would be silly not to.
In response to recently reported security vulnerabilities, this firmware update is being released to address Common Vulnerabilities and Exposures issue numbers CVE-2017-5715, CVE-2017-5753 and CVE-2017-5754. Operating System updates are required in conjunction with this FW level for CVE-2017-5753 and CVE-2017-5754
AMD also offers superior value for the money on a per-CPU basis, even after discounts (which both Intel and AMD offer generously). This makes them exceptionally suitable for high-memory nodes.
A website can read data stored in the browser for another website, or the browser's memory itself.
Apple A11 CPU multi score 10176
Intel Intel Xeon E5-2696 v4, 2 cpu score 82225
Geekbench scores of Intel
Geekbench scores of Apple