The connection between the linked article in this comment and the linked page for this post is that there is a potentially huge bug that will be made public soon and it just affects Intel processors, not AMD - hence the large sale of stock by the Intel CEO.
A lot of people on Intel will suddenly lose a noticeable amount of performance. Conversely, if your Intel based VMs lose 25% performance, you are now booting up and paying for 20% more VMs for the same load.
I am no lawyer, so I don't know if this is really allowed. My gut instinct is (a) no, it is not allowed and (b) there will always be some more subtle version of the tactic that is allowed.
If it wasn't in the open, seems...not ideal embargo-wise for AMD to leak it there. Though no one's in that thread complaining about the disclosure, so maybe they either think that part is already known to anyone looking closely, or just don't think it's a very big piece of the exploit puzzle (like, finding the way to get info out a side channel was the hard part).
my123 does point out that the author of the speculative execution blog post is first in the KAISER paper's acknowledgments, and looks like the paper was presented at a July conference, so that's an earlier clue out in public, for what it's worth.
I imagine if someone had complaints they would make them in private so as to not make the situation even less ideal embargo-wise.
It also seems, from early benchmarks, this can slaughter performance with databases.
If so, then isn't it technically correct that the bug will affect regardless of virtualization or not, but heavier penalty for VMs?
I've seen nothing except the 'caution side lets do it with all' approach and no indication of other problems on the other hand.
Another syscall I think that might cause issues is gettimeofday(), that particular call has been optimised to the nth degree, and lots of user programs spam the crap out of it (mostly necessarily), especially networking and streaming programs. It would be interesting to see how much of an overhead percentagewise page table isolation will cost, and its effects on low end media devices, et al.
Note that mutex contention does not itself mean immediately falling back to futex - commonly you'll spinloop first and hope that resolves your contention (fast), then fall back to futex (slow)
I can't really devote time to countering the unfounded assertion that every contended mutex must be a bug. It certainly isn't consistent with my experience, but if every problem you've solved could have been parallelized infinitely without increasing lock contention, more power to you.
Good, because that's not what I said. If you're heavily hitting futex convention you do have a performance bug, though. You might be confused with general contention that's being resolved with a spinlock rather than futex wait, though.
> Good, because that's not what I said.
It is literally what you said:
>>> If you have a heavily-contended mutex you already have a major performance bug. If this is the kick in the pants you need to go fix it that's arguably a good thing ;)
> You might be confused with general contention that's being resolved with a spinlock rather than futex wait, though.
I'm not confusing them at all; I'm literally reading exactly what you wrote. You literally said contended mutexes are necessarily bugs (right here^) and that you considered mutexes to include the initial spinlocks ("note that mutex contention does not itself mean immediately falling back to futex - commonly you'll spinloop first"). But maybe you meant to say something else?
I concur with his opinion. Infrequent contention is not a bug; otherwise no mutex is needed. Frequent contention (or heavy contention in his words) is a performance bug.
"Heavily" was not dropped intentionally at all. Add it back to my comments. It changes nothing whatsoever. The incredible opinion that every problem can be necessarily parallelized without eventually resulting in contention (and I license you to freely modify this term with 'light', 'heavy', 'medium-rare', 'salted', 'peppered', or 'grilled at 450F' to your taste) is so fantastically absurd that I cannot believe you are debating it. I definitely don't know how you can justify such an unfounded claim with no evidence and I certainly have no interest in wasting time debating it. As I said earlier: if you never encounter problems that exhibit eventual scalability limits, more power to you.
I mean, the parent's argument is wrong, but isn't that naive. Presumably the argument is a bad (yet still correct) solution would result in lock contention while a better solution would e.g. use a different algorithm that is more parallelizable.
DPDK for 10-100 Gbps networking: https://dpdk.org/
SPDK for NVMe storage: http://www.spdk.io/
The queuing and balancing stuff the kernel does makes sense for spinning rust harddisks and residential networking, but when the underlying hardware is so fast that nothing is ever queued, really what are you doing. At 100 Gbps line speed, a 1518 byte packet takes all of ~ 120ns to transmit, or about 360 clock cycles for a 3 GHz processor.
I would personally think that is worse, though please correct me if I'm wrong. The userland driver will run with an isolated PT like any other userland process won't it? If so, it will suffer the same slowdown that every other process now has every-time it has to communicate with the kernel, which I would think would be a lot for a driver.
That is in a nutshell what a "userland driver" is. It's not too far removed from poking the parallel port at 0x378 on your DOS computer :)
Even before this fix, the benefits were massive, as sending a buffer was just writing to some memory, rather than syscalls and copies galore.
Though the patches evolved since then. So I guess we'll see.
The worse case would be something like a frequent syscall followed by code that touches a number of distinct cache lines, which all now require a TLB reload and page-walk (even here the cost is tricky to evaluate since there are various levels where the paging structures can be cached beyond the TLB, so the cost of a page-walk various a lot depending on locality of the paging structures used).
In that case (PCID hardware on a PCID-enabled kernel), the performance effect should be more limited to the syscall itself. That said, why is the hit still so big with PCID? Surely just the CR3-swap by itself shouldn't be so slow?
That's how the market works.
OTOH, Intel's SYSRET is actively dangerous and has resulted in severe security holes, and Intel doesn't appear to acknowledge that their design is a mistake or that it should be fixed.
AMD CPUs are differently dumb. If SYSRET is issued while SS=0, then the SS register ends up in a bogus state in which it appears to contain the correct value but 32-bit stack access fails. Search the Linux kernel for "SYSRET_SS_ATTRS" for the workaround.
- /* Assume for now that ALL x86 CPUs are insecure */
And yes, I've read quite a few papers, and I wrote a good fraction of the patches.
When you think about your own workstation, it's not a big deal to build an Intel or AMD system. But when you buy 100k motherboards and spend the time adjusting your tooling to those, from packaging to power, to cooling, to support, to OS code, etc. and then you on a whim decide to get another 100k motherboards of a different architecture, you spend a non-trivial amount of time and money to support those as well. Again, if AMD provides better hardware, it's absolutely worth it. But I personally wouldn't do it based on this bug.
I don't own shares of either AMD or Intel.
But that's the problem. Nothing works 100% of the time, which is why monoculture is bad. When there is a bug that affects 20% of your systems, you can continue operating at 80% capacity, which at a reasonable level of reserve/redundancy means you're still entirely up. With a monoculture the bug affects everything and you're entirely down.
> But when you buy 100k motherboards and spend the time adjusting your tooling to those, from packaging to power, to cooling, to support, to OS code, etc. and then you on a whim decide to get another 100k motherboards of a different architecture, you spend a non-trivial amount of time and money to support those as well.
This is why hardware abstraction is a thing.
It's almost always less expensive to support diverse hardware from the beginning than to wait until after the market shifts.
Eventually the day comes to switch from 68K to PowerPC, or PowerPC to Intel, or Intel to ARM, or ARM to whatever else. Because eventually you save/gain a zillion dollars by switching and it "only" costs three quarters of a zillion to switch.
But it would have cost a tenth that much to have supported diverse hardware from the start, and then the transition is only a matter of using more of the now-superior hardware rather than being stuck on the now-inferior hardware for potentially years while everything is rearchitected from scratch.
This is the mistake in your argument. Motherboards, CPUs, RAM chips, GPUs are analog and physical objects. For AWS to switch a DC from one mobo to another just to find out that this one draws 5% more power and their standard backup generator can't handle it, which now starts a chain reaction of upgrades is going to incur real world costs. Costs that can't be amortized by writing some code to make the motherboards look the same.
This is basically the A/B testing/one-armed bandit problem. How much do you spend time exploring alternatives vs how much time to you reap the benefits of the fact that all your hardware is exactly the same and best of breed, as based on your testing?
> When there is a bug that affects 20% of your systems, you can continue operating at 80% capacity, which at a reasonable level of reserve/redundancy means you're still entirely up. With a monoculture the bug affects everything and you're entirely down.
These situations simply don't lose enough money to make up for the gains of a monoculture.
Think about it this way. You probably run or at least know someone who runs SaaS products. Do you/they use five different cloud providers in equal measure to make sure you have diversity if one has an issue? Do you/they use five different software stacks in case there is a remote exploit for RoR and PHP holds things up? Do you/they buy groceries at five different grocery stores in case one of them has an e. coli outbreak that the other four don't? The answer to all that is now, because no matter how you try to abstract these things, there is a meaningful difference between PHP vs Django vs RoR vs Express vs .NET and between AWS and GC and Azure, that it would cost you a lot more, not just in billing but in engineering effort to support.
Another example: chances are you've at some point built a RAID array. Did you put different size and performance drives from different manufacturers into it or did you buy N of the same drive type to ensure even performance? If so, why?
Put another way, how much more are you willing to pay on your AWS bill to ensure they are running a mix of ARM, AMD, PowerPC, and Intel chips? Because my guess is that it won't be in the range of 1-2%.
Being physical isn't different. The data center is designed to allow systems that consume up to, for example, 500W. When one consumes 400W and another consumes 420W, they're still fungible. A system that consumes 525W can't be used, but you know that so you don't use those.
> This is basically the A/B testing/one-armed bandit problem. How much do you spend time exploring alternatives vs how much time to you reap the benefits of the fact that all your hardware is exactly the same and best of breed, as based on your testing?
That isn't the relevant problem. Even if you choose monoculture, you still have to pay the cost of weighing your alternatives to decide which single model to use.
The cost of diversity is that the second best model on some metric is 20% worse than the best. But that is also the advantage, because on some other metric it's 20% better. You can use each model for its strength. And since you can't perfectly predict the future, when something unexpected happens you're better able to handle it, because for any given thing that only some systems can do, you will have some systems that can do it.
> These situations simply don't lose enough money to make up for the gains of a monoculture.
Beware survivorship bias. It's easier to find an active monoculture company that has never had a major problem than one that has, because having a major problem in a monoculture often results in bankruptcy.
> Do you/they use five different cloud providers in equal measure to make sure you have diversity if one has an issue?
For services with high availability requirements, people absolutely do that.
> Do you/they use five different software stacks in case there is a remote exploit for RoR and PHP holds things up?
That wouldn't reduce attack surface. The relevant thing people do is to use two factor authentication.
> Do you/they buy groceries at five different grocery stores in case one of them has an e. coli outbreak that the other four don't?
Having multiple local grocery stores is a thing people want. And people do actually use them, because different stores have the best price or quality for different products.
> chances are you've at some point built a RAID array. Did you put different size and performance drives from different manufacturers into it or did you buy N of the same drive type to ensure even performance? If so, why?
These are spec differences, not supplier differences. There is no issue with using drives of the same size and speed from different manufacturers.
Also compare ZFS, which allows you to efficiently use unmatched drives for the same filesystem.
When you're scaling up/maintaining your DC, you're much more likely to be looking for single sku, like for like products that allow similar tooling, knowledge base, experience etc... Like you said, monoculture has its benefits in some situations.
Personally even with this bug I'd be very hesitant to switch. Our previous tests between them for very specific workloads showed our best cost/performance was with Intel over successive generations, and the scaling/tip over points were different. We have a combination of experience and knowledge around the existing arch and how our applications and workloads interact with it that involved a number of pain points that I'm not sure its worth it to re-experience with another arch.
On the other hand, those running on non-bare metal, cloud based, auto-scaling/automated solutions that have a wider tolerance for individual app performance, are probably in a situation where they care less about this, but at the same time have little to no say in the arch they run on, that decision is left to the cloud providers they use.
just my 2 cents anyways.
The reason every server isn't POWER is: ecosystem. For any random company, switching archs for anything less than a multiple factor gain is a daunting multi-generation proposition. For a hyperscaler like Google the bar is a lot lower but you need a compliant vendor that will do a lot of the long haul platform work. IBM's been trying to establish that for many years and is just about to pull it off. Supply chain is also important, hyperscalers have come to expect buying and building systems a certain way and IBM will now just sell chips or even the IP for you to fab yourself. And of course the total cost calculus: capex, and opex in the form of TDP, support burden.
Google _will_ be using P9 for GPU servers internally. The inflection point for them was I/O and memory bandwidth. So, paradigm shift was what was needed to turn a juggernaut.. and that is what adding a bunch of accelerators to your platform is. Intel has no good solution there.
I believe POWER9 has the ability to be either big or little endian as well, so that helps for compatibility issues, and it's just a matter of whether your application can compile.
Google is pushing both PowerCPU development as well as ARM. They seem to be able to sort for this just fine. You can write tools to sort the differences. You cannot write tools to fix major HW issue.
Anyway my 2 cents based on experience and history for whatever the comments of a random person on the intertubes is worth.
At the same time AMD also has a golden opportunity to for some PR and marketing.
I'm not a believer in stock price as good indicator of anything, sorry to skip that part.
Intel's value is 99% engineering + manufacturing ability + customer relations. It would be a poor CEO indeed who'd direct their IT to start buying AMD because of this alone.
Let's be clear: It's both.
Major system vendors are now offering to apply bootleg removal situations at the factory on customer request. That request is not free. People are willing to /pay extra/ for no-IME laptops.
Either Intels marketing and public relations department are asleep at the wheel, or they've gone to the top to request a friendly switch to disable this and been told by the legal department that they can't have one.
I don't like the fact that you can't disable ME, that it's not open source, and that it's vulnerable any more than anyone else. But this does seem like hyperbole much more than fact.
95% speculation. The last 5% comes from exercising basic pattern recognition.
I remind you that we're probably talking about interference from the organization that arranged this:
The existence of that program was pure speculation, until it turned out to be totally real.
>(b) fails Hanlon's razor.
This is completely irrelevant to any argument made between two informed participants. It's worse than speculation, it's a plea to glib colloquialisms. Any chance you've got evidence or even reasoned speculation supporting the theory that the worlds most successful CPU manufacturer has an incompetent marketing department?
> 95% speculation. The last 5% comes from exercising basic pattern recognition.
No, it's all speculation because pattern recognition is not evidence, as applied here. Like, is it possible that I am an NSA agent trying to persuade you that you are safe and shouldn't worry about ME? Of course it's possible. But do you have any evidence of that? No.
"Well, in the past the NSA has asked big companies for backdoors into their products" is a true statement with evidence. "That implies that in this case there is a 5% chance that is exactly what's happening" is 100% speculation because again there is no evidence. If you can find any, I am all ears because honestly I am not a fan of Intel, Intel ME, the NSA, government spying, big corporations taking advantage of consumers, or a number of other things I imagine you and I agree on. But I think I am being rational when I say that chances are this is a stupid bug or number of bugs, plus bad old school thinking on the part of the management team, and not a deliberate NSA feature.
Here is my bit of speculation: if the NSA asked Intel to include a backdoor, wouldn't they both have done a better job of creating it? Why introduce a bug when you can include whatever code you want in a closed source firmware? You can literally add any kind of C&C mechanism you want because nobody can see what you are doing and nobody would ever know. Is the NSA that stupid to to ask for a bug that can be found and exploited? Is Intel not able to offer a better technical solution? Wouldn't it be to both of their benefits to do this right from the start? Also, why only approach Intel and not AMD? AMD is not as popular but surely has enough market share to warrant spying on.
Why aren't you viewing the possibility of intelligence agencies ordering the Intel ME as one of these future historical facts? If the proof for that became known today, both the agency and Intel would scramble to introduce a better backdoor in the next generation CPUs / MBs and devise a marketing campaign to make it sound good -- and to bash their former selves for "making a mistake" while simply thinking "OK, we're gonna cover it up much better this time and we're gonna twist it in such a way that people would flock to buy it". It's what marketing and spies do; they twist facts. Why is that so non-legit for you?
Furthermore, you're asking why didn't they do a better job if it was a conspiracy. People in closed circles aren't exposed to public criticism and their thinking is affected in the process. They usually think "meh, good enough, nobody will ever find it anyway". They are humans like you and I and are susceptible to bad days or negligence due to being tired. Furthermore, it's very likely they were under pressure to make it work quickly so they took shortcuts. What makes you think the programmers of the intelligence agencies have godlike powers over their (very likely) military superiors? Answer is, they don't. Programmers have no executive powers and their counsel is usually met with skepticism if it doesn't fit the management's agenda.
When talking about intelligence, our best bet is to do educated guesses. If we had hard facts we would be targets. As mentioned in another reply of mine directed at you -- it's their job to hide the facts. So you requesting proof of these matters is basically refuting all possibility of intelligence agency commission of the Intel ME on the grounds of "hey, you are not the next Edward Snowden so your arguments are invalid".
Meh. You come across as a guy who basically says "my speculation is better than yours". Not constructive.
Your theory in the above comment is that the NSA or equivalent ordered Intel to build a C&C mechanism into their processors. Intel then did a perfect job covering up this request, but did a piss poor job of implementing it due to incompetence and has not managed to correct it for 10 years. There is no indication that this might be the case but because of other unsavory activities by the NSA or equivalent it can be assumed that at some point evidence will be uncovered that you are right and therefore we should accept it as fact. Do I have that right?
Judging by other activities of the intelligence agencies and working with pure speculation -- not hiding from these words, you are correct by calling it that -- I still think it's much more likely they commissioned the Intel ME.
You mention critical thinking in another comment. Critical thinking, the way I apply it, also requires a historical context to be applied to the situation one is analyzing. Agencies have been doing pretty shady stuff and some of it has been uncovered for the entire world to see.
Critical thinking, the way I apply it, says that the odds are there is a foul play. I merely wish you to recognize that this is the more likely scenario than a bunch of coincidences and/or people supposedly making the ME to serve data center sysadmins -- btw many of those sysadmins, including on several threads here in HN, said they never used the ME and named a plethora of other tools.
Obviously I am not trying to change the way you think in general. I believe we can both agree that none of us knows for sure. The human brain's strength is to work with many variables and be able to impose some order in the chaos by pattern recognition and using historical info. I am not gonna deny this can lead to people drawing awfully misguided conclusions sometimes -- and I've been guilty of that as well! -- but it's the best we have, especially having in mind what tiny imperfect brains we have to work with.
Everything I can name are circumstantial evidence. I accept that. It's the nature of the area. Intelligence data isn't easy to come by.
And I am saying that the confidence interval on that calculation is just orders of magnitude not tight enough. I am not denying that you could be right. It's just that I am giving that possibility something like a 1% chance of being true, while something like 85% chance of this being pure incompetence by Intel management and engineers (the rest being some other explanation that's neither malice nor direct incompetence). I don't think you and I can find a common ground on this estimation.
Again though, ME is a bad thing because it's not open source, it can't be turned of, and it's buggy. Regardless of who ordered its creation, it sucks.
You're also saying, implicitly, that therefore we must default to assuming it is incompetence.
That link isn't a given. Stating that it is incompetence is also speculation, not some kind of universal backup truth.
However, when it comes to that last 5%, I assert that the historical data does not back a claim that Intel's marketing department is incompetent.
Until it's financially worth their while, why would they spend money on it?
You request a proof that's impossible to procure. Are you now gonna claim the lack of this proof supports your thesis?
1. Intelligence agencies have been known to force companies to give them access to their products.
2. Companies have been known to comply, if reluctantly, at least until a whistleblower exposes the program.
3. Intel ME was developed as an on-chip version of an external card that is actually useful.
4. Intel has made poorly engineered products before.
5. Intel isn't in a habit of open sourcing firmware.
6. From a technical standpoint, Intel is fully capable of creating a system that doesn't allow C&C through a bug and an exploit.
7. AMD, the second largest computer chip maker does not have a matching system that can't be disabled and that has similar bugs.
Based on this, I'd say it's possible that the NSA (or equivalent) asked Intel to develop ME and add a bug to allow C&C, but very unlikely.
It's also possible that the NSA (or equivalent) asked Intel to develop ME and add C&C and Intel did it through a deliberate bug, but very unlikely.
It's also possible that Intel tried to develop a feature the market might want, and screwed up the implementation. This seems to me to be very likely. It's the simplest explanation (Occam's razor) and it requires only incompetence, not malice (Hanlan's razor), so it's sort of by default most likely.
If someone can produce an iota of evidence to the contrary I will change my allocation of probabilities appropriately, but so far the evidence is "it could have been done" and "they've been known to spy on people in the past". In my book that's not a strong enough argument.
I would say that too if I'd be waiting for everyone to sell so then I could buy INTC :-)
If it wasn't intentional, then it wasn't a compromise. So it's not a different point.
Core 2 architecture? Nehalem?
Having a hunch Threadripper will sell extremely well amongst PC enthousiasts this year...
Seems like that was discontinued a long time ago (2011) so was wondering if there was something more recent that happened?
Which is another way of saying you had those lanes, and Intel wanted more money before letting you use what you’d already bought.
Sounds like Intel has just made it unlockable instead of permanent. It just brings to the fore what was already being done, and makes us question again the ethics of pricing models.
Some of the chips will be fully capable of running with all parts enabled, but in a higher power envelope (this is a guess - but I believe that the fully capable chips most likely to be sacrificed are those that have trouble fitting in the ideal power envelope).
I would also imagine that (under some circumstances) chips that are fully functional within the expected power envelope will be artificially limited in order to control levels of stock.
The vast majority of chips that are limited in this way will be out of spec, unstable or inoperable when unlocked.
The chipset unlock thing is different as there's no technical reason to lock it in the first place.
That's binning & price discrimination, Intel did the same (with quads v dual IIRC): if you have a defective core, you gate it and sell a 2/3 core instead of a quad. Of course the issue is when the low bin becomes too popular and you have to start low-binning "perfect" parts to keep supplies acceptable (used to be very common for Intel starting ~mid-cycles, they'd literally run out of defects, which is why their low-end CPUs had such good performances & were ridiculously overclockable)
However, that (and the later software modification) could both hamper performance in games and could exhibit correctness problems in accuracy-focused use cases, so it was rarely a great idea.
A major underlying cause is that we're doing things in hardware that ought to be done in software. We really need to stop shipping software as native blobs and start shipping it as pseudocode, allowing the OS to manage native execution. This would allow the kernel and OS to do tons and tons of stuff the CPU currently does: process isolation, virtualization, much or perhaps even all address remapping, handling virtual memory, etc. CPUs could just present a flat 64-bit address space and run code in it.
These chips would be faster, simpler, cheaper, and more power efficient. It would also make CPU architectures easier to change. Going from x64 to ARM or RISC-V would be a matter of porting the kernel and core OS only.
Unfortunately nobody's ever really gone there. The major problem with Java and .NET is that they try to do way too much at once and solve too many problems in one layer. They're also too far abstracted from the hardware, imposing an "impedance mismatch" performance penalty. (Though this penalty is minimal for most apps.)
What we need is a binary format with a thin (not overly abstracted) pseudocode that closely models the processor. OSes could lazily compile these binaries and cache them, eliminating JIT program launch overhead except on first launch or code change. If the pseudocode contained rich vectorization instructions, etc., then there would not be much if any performance cost. In fact performance might be better since the lazy AOT compiler could apply CPU model specific optimizations and always use the latest CPU features for all programs.
Instead we've bloated the processor to keep supporting 1970s operating systems and program delivery paradigms.
It's such an obvious thing I'm really surprised nobody's done it. Maybe there's a perverse hardware platform lock-in incentive at work.
IBM AS/400 for about 30 years now.
This sounds halfway like ART.
The overall idea has a lot of merit (and, for example, Apple is moving towards this model with the iOS AppStore) - but I don't see how it solves the current problem.
Across a variety of architectures, the market has come down firmly in favor of hardware address translation and protection. There are various implementations, many not subject to the current side-channel, but all of them do most of the heavy lifting in hardware: TLBs and related things "just work".
Lets say you had some intermediate format and executed everything in a single 64-bit address space after a final JIT compilation step (your suggestion, as I understand it). How you would implement process and kernel memory protection? It amounts to a bounds-check on every memory access. Certainly you can use techniques common in bounds-checking JITs today to eliminate many of the checks via proof methods, hoisting and combining bounds checks, etc - but the cost would still be large in many cases.
Maybe you want a hardware assist for this bounds checking then? Well follow that to its logical conclusion and you end up with hardware protection support: maybe in a slightly different form than we have today, but hardware support nonetheless.
There are a lot things we could do differently with a clean-slate design, and I think intermediate representations have a lot of merit (e.g., the radical performance improvements partly as a result of radical architecture changes enabled by use of intermediate formats in the GPU space are evidence this works) - but hardware address translation doesn't seem like the problem here.
What I'm suggesting is not a total clean slate. It could be done easily on current processors or current instruction sets and would be more an omission than a change to core architecture.
I wonder if doing it on current chips and just ignoring all the protection and remapping logic would have a performance benefit? Look at the boost you get on some databases with transparent hugepages, which kind of do that.
You only have to check that the memory address is not negative (kernel pointers are negative on x86-64). No extra memory access needed.
> You can't do any sort of static analysis I'm aware of that'll still allow you to run C code (which let's you manufacture pointers from arbitrary integers).
NaCl managed to do it (https://developer.chrome.com/native-client/reference/sandbox...).
As to NACL, it relies on various CPU protection mechanisms, and also makes some major trade-offs: https://static.googleusercontent.com/media/research.google.c.... On x86, NACL uses the segmentation mechanism. On x86-64, which has no segmentation registers, it masks addresses and requires all memory references to be in a 4GB space. To handle various edge cases, and to speed up stack references, it relies on huge guard areas on either side of the module heap and stack, thus relying on the virtual memory system. Finally, likely to mitigate the overhead of masking, it does not address reads at all, and relies on the virtual memory system to protect secret browser information from the sandboxed process. Even with these limitations, on about half the SPEC benchmarks the overhead is 15-45%.
You could still toss a lot: virtualization, complex multi-layered protection modes, address remapping, and essentially every hardware feature that exists to support legacy binary code. All deprecated instructions and execution modes could go, etc.
Finally you would maintain the benefit of architecture flexibility. Switching from x86 to ARM, etc., would be easy.
Actually doing a big of searching I think you originally could but for some reason they removed it.
What you're referring to is probably C++/CLI, which wasn't removed, but it hasn't really been updated for a while. C++/CLI is a set of language extensions that make it possible to interface with the .NET object model.
If we go feature by feature, the .NET type system and bytecode has:
- unsigned types
- raw (non-GC) data pointers with pointer arithmetic
- raw function pointers (distinct from delegates)
- structs and unions
- dynamic memory allocation on the stack (like alloca)
- vararg functions
In reality, the ultimate source of this problem is the mismatch in speed between silicon logic and silicon memory. This is why your CPU ends up doing all sorts of tricks like caching, branch prediction, speculative execution to compensate for slow memory.
Can intel release a drop in CPU that will avoid or mitigate this issue?
The infrastructure investment in intel cores is huge, if a drop in replacement lets me minimize downtime, re-gain performance and is "cost effective" compared to a cost prohibitive replacement does this result in intel having a sales INCREASE where it replaces bad silicon?
I don't know enough about this issue to speak to the issue either way, but I would love to hear if this fix is possible/viable.
Don’t forget to correct for the subtle loss in credibility, and subsequent immeasurably subtle dip in sales, amortised over… forever.
Keep in mind, this is more-or-less just the 4GB/4GB patch set that floated around awhile back for 32-bit systems, and that patch was never merged precisely because of the big performance impact it imposed - and that change actually had some merit, this one has none besides security. I don't think Linus would be letting this go through (And especially be on by default) without a fuss unless there is really no other way to mitigate a fairly big security hole. That's just my opinion, but it seems pretty clear to me. It's possible he has not spoken to anybody at Intel about this, but I would personally think he has some connections to get some info on it.
With a future os-specific microcode patch a config option would make more sense.
They just need to find a 2nd C3 register to seperate user from kernel space or do the permission check before prefetches. Like AMD does. On Linux this check would be cheap, on Windows NT not.
Or maybe it could be that Intel privately disclosed already that no backport will be done to firmware of older CPUs, in which case the kernel update is the stopgap for newer generation and the solution for older generations.
BTW, removing the kernel from the non-privileged address space seems like such a great idea (which is not a new one at all) the whole thing should probably should have some hardware support to be made fast.
I don't think so, but it depends what you mean.
Kernel space and user space being separated isn't specific to a microkernel. The only reason the kernel is mapped in to each process is to avoid the TLB flush during syscalls. The pages themselves aren't actually accessible unless you're running in kernel mode (well, unless you're using hardware affected by this bug). So, in a non-broken system, kernel and user spaces are separated, even with a monolithic kernel (Linux).
> BTW, removing the kernel from the non-privileged address space seems like such a great idea (which is not a new one at all) the whole thing should probably should have some hardware support to be made fast.
For the most part, I agree. However, it really shouldn't be necessary if the virtual memory protection did what it was supposed to do. Mapping the kernel in to the process address space and using the page protection flags is an optimization that is perfectly legal from an architectural standpoint.
If you can't rely on the page protection flags to work, then you really can't rely on any other hardware feature to work either.
Is there enough spare capacity to cope with this? Will spot-instance prices go up? Will I need more instances of a given type to run the same workload?
But in any case, the KASLR bypass is not the main vulnerability here. KASLR is widely seen as too leaky to be really useful. Linux would not rush out a >5% performance hit just to fix one of the many leaks.
Overall, this has between 0.28% (best case application with barely any syscalls) and > 50% (du, which does lots of syscalls) impact on performance on Intel processors.
It doesn't speed up AMD hardware. Intel incurs a performance penalty (so it doesn't leave Intel the same as before), but that penalty doesn't make your AMD CPU magically go faster.
KASLR bypass is just a small bonus.
The author of that blogpost is mentioned in Acknowledgments on the Kaiser whitepaper. :)
I wonder if there are other (non-x86) CPUs that do similar speculative execution affected... the general ideas behind it don't seem to be specific to x86.
...but the blog post above shows that you need to execute instructions that (try to) access kernel addresses, and have a handler in place to catch the inevitable exception. That doesn't seem like code a JS JIT could generate.
You might be thinking of that JS RowHammer demonstration, but that was using regular memory accesses and not with the specific kernel addresses that you need for this.
but then they go back to leading a much more difficult online life than the rest of the world.
I completely disagree, because I don't have to routinely subject myself to the barrage of useless distracting noise (adverts and whatever else) caused by JS. https://news.ycombinator.com/item?id=10871967 (The rest of the comments on that item are worth reading too.)
Also, "JS off" is very much in agreement with "don't run untrusted code", something which everyone who cares about security in any way would have no problem with.
Yes I advocate for JS to be a part of the web. There are good reasons for it. But regardless, it has nothing to with advocates. We are in this situation because browser vendors included JS and developers and users found it useful. Again, you are free to deny the idea that this is irreversible, but I am with the 99.99% of users of the web who have JS enabled.
Edit: QoL is subjective of course, but let me ask you this: when was the last time you really had JS enabled by default and how did you measure the trade off? My suspicion is that most people who put on their tin foil hat^W^W^W^W^W^Wturn off JS by default don't actually turn it back on frequently, and spend a whole lot of their lives fiddling with drop down menus to enable/disable JS on specific sites.
Even without special extensions and keyboard shortcuts you spend very little time on fiddling with menus. It seems like a lot only at the beginning and quickly gets to near no fiddling at all. But it also saves time on various things, like when your adblocker doesn't catch something and you have to close those clickunder born popups or see a page full of ads where it's hard to even find the content, things also load faster and so on.
I fully support not making content delivery rely on JS. But I disabling JS because it can be used for intrusive ads is a lot like taking the wheels off your car because it can take you to the mall where you might see big for sale signs and annoying sales people. Effective, but stupid.
You should try it sometime. Selectively enabling JS will be annoying at first, but as long as you save your preferences, the web will soon become a much less terrible place, and you'll rarely have to tweak your config. This approach won't work for non-techies, of course, but it's not much of a hardship for someone vaguely familiar with how the web works. Amazon, for example, works fine with a bit of JS not including amazon-adsystem.com.
Slacks has an IRC gateway, though.
I'm not sure what the advantages of this argument are anymore. JS is now so ubiquitous I can only imagine how a drive-by JS exploit can truly mess you up in obscure ways despite the fact you browse the web with IE4.
And this is not gonna change before a huge paradigm shift in network protocols and network apps.
Accusing people not running browsers that expose them to a nasty bug of feeling an exaggerated sense of "smugness"?
This seems disrepectful of users.
Doesnt that violate HN guidelines?