I know that bugs happen and that there was nothing intentional on this one, but at times like this is hard to held at bay the temptation of claiming for a class lawsuit against Intel....
It's, however, really bad if you sell CPU cycles for a living. You just lost between 5 and 30% of your capacity. If you have a large building, you just lost part of your parking lot to the Intel Kernel Page Problem building.
Honestly, I'd just make sure the server firewalls are super tight and not take in the future patches. At least for now.
I know I’m not alone.
Then again. Think of microservices, Kubernetes for instance; Network requests are system calls.
(In my case anyway)
30% overhead might be inscentive to revisit the assumption we can’t rewrite it for Linux.
I don’t know very much about computing on that scale, but I wonder if all the people selling off Intel stock are thinking this story through.
It's possible that the patches applied to fix this bug will cause some single-threaded benchmarks to change from Intel being the fastest to AMD being the fastest.
Not trying to kill expectations. This decision isn’t mine alone. You know the old saying “nobody got fired for buying Cisco” that applies to Intel too.
Good security is about layers. No one layer can be assumed to be watertight, but with enough layers you hopefully get to a good place.
That's a good description of basically every cloud environment out there, from AWS on down.
In other words they are extremely common.
We'll start to get conscious about the number of syscalls we use on each operation, start using large buffers, start buffering stuff user-side...
TL:DR - queue behaviour gets nonlinear as you approach the theoretical max load. If you are running your processors at a high load, even a small change in code throughput makes a huge difference to real world behaviour.
30% is a big hit. I'm wondering if that isn't a bit exaggerated, or perhaps the consequence of a poorly optimized workarounds that will rapidly improve. I recall seeing figures on the order of 3% only a few days ago.
How big it will be for your workload is a function of what your workload is. Benchmark if it is important to you.
Who really sells CPU cycles? Cloud providers sell instances priced per core. So the real hit is by the customers since they have to shell out for more instances for the same amount of computing power.
The hit I see is by providers of 'serverless' computing, since they charge per request and have their margins reduced.
AWS, Azure, and GCP all bill serverless with a combination of per-request fees and compute (GB-seconds), so I'd expect the entire hit to be passed on to the user since this will cause increased compute time for each request. N requests that used to average 300ms each will now be N requests that average, say, 400ms, so the per-request billing remains the same and the compute billing will increase by approximately 30%.
Also a 30% decrease is also equivalent to setting Moore's law back 7 months. A 5% loss is only setting it back 1 month. I know that's a bit of a naive calculation. But the point is computing power has long operated in an exponential domain. So big differences in absolute numbers aren't necessarily a big deal.
Well... Everyone who bought AMD. Some people managed to see beyond the hype and go for the optoon that made sense.
What hype are you referring to? Are you suggesting the people who bought AMD knew this was a problem for Intel?
Maybe a lot more now?
True, sometimes you will leave boxes at low utilisation for various reasons, e.g. to deal with traffic spikes. But those reasons have not gone away. So now instead of heaving a predictable increase in CPU cost, you have an unpredictable increase in performance snafus.
The only good news is that the real performance hit will be less than 30% on many workloads. Especially once the providers start juggling and optimising.
What do you mean by "compressible"?
Kernel ABIs will eventually reflect that and crop up higher level expensive calls that replace groups of currently cheap syscalls (that will become expensive after the fix).
And Intel will profit handsomely from next generation CPUs that'll get an instant up-to-30% performance boost for fixing this bug.
Not to be mean, but that's not what is being changed.
You're right on the bug - userlevel code can now read any memory regardless of privilege level. However the fix isn't to manually check the privileges on each access - that would be extremely slow and wouldn't actually fix the problem.
The fix is to unmap the kernel entirely when userspace code is running. Because the kernel will no longer be in the page-table, the userspace code can no longer read it. The side-effect of this is that the page-table now needs to be switched every-time you enter the kernel, which also flushes the TLB and means that there will be a lot more TLB misses when executing code, which slows things down a lot.
So, to be clear, it is not accessing pages that is being slowed down, it is the switch from the kernelspace to the userspace.
All your points are right though. Page access times will in general be slower because of all the extra TLB flushes, leading to more TLB misses when accessing memory.
Or am I completely off the mark?
And how often the kernel services interrupts.
If people who received written assurance from Intel that their hardware is 100% bug free can form a legal class, sure. I highly doubt there is even a single one such customer.
What I meant was that the presence of the bug itself is not a valid cause, for example you can't claim that due to the error you lost 1 trillion dollars via a software hack - even if it's true. If Intel can prove they acted ethically when disclosing the bug and that they replaced / compensated users up to the value of the CPU, they are in the clear.
Yes and no. Yes, Intel would get a chance to claim that the case should be dismissed out of hand. To do that, they have to prove that, even assuming all the claimed facts are true, the people suing still don't have a valid case. That's a high bar. It can be reached - there's a reason that preliminary summary judgment is a thing in court cases - but it takes a really flawed case to be dismissed in this way.
How flawed? SCO v. IBM was not completely dismissed on preliminary summary judgment, and that was the most flawed case I've ever seen.
> It may very well be a question of who has the better legal team.
Well, Intel can afford to hire the best. A huge class-action suit can sometimes attract the best to the other side as well, though. (There's not just one "best", so there's enough for both sides of the same court case.)
IANAL, but it looks to me like there's at least the potential for a valid court case. CPUs are (approximately) priced according to their ability to handle workloads; if they can't provide the advertised performance, they didn't deserve the price they sold for.
The question is did anyone receive performance assurance from Intel? Probably not.
Some cloud providers or compute grids just lost a lot. Maybe they will find an angle to claim compensation.
Incorrect. It also affects interrupts and (page) faults.
Any usermode to kernel and back transition.
Hosting on bare metal will become more attractive. Too bad you can't long OVH and Hetzner.
What does that even mean?
Also Hetzner just introduced some AMD Epyc server.
As opposed to "shorting" a stock, which means making a bet that it will go down in value.
HN doesn't let you do this to new comments to avoid back-and-forth commenting that is typical in flamewars.
You can reply anyway, but you have to click on the timestamp ("X minutes ago") to do it.
The other benchmark that has generated some consternation is running 'du' on a nonstop loop.
Both of these situations are pathological cases and don't reflect real-world performance. My guess is a 5-10% performance hit on general workloads. Still significant, but nowhere near as bad as some of the numbers that are getting thrown around.
And, databases are the worst case scenario, most real-world applications are showing 1% performance impact or less.
Your last link is all gaming benchmarks, which as the article mentions are not affected much.
(We should probably also stop overgeneralizing about the nature of computational workloads.)
This isn’t an excuse for Intel consistently having terrible verification practices and shipping horrendous hardware bugs. From 2015: https://danluu.com/cpu-bugs/ There have been more since then.
I’ve talked to multiple people who work in intel’s testing division and think “verification” means “unit tests”. The complexity of their CPUs has far surpassed what they know how to manage.
Found a quote:
"We need to move faster. Validation at Intel is taking much longer than it does for our competition. We need to do whatever we can to reduce those times… we can’t live forever in the shadow of the early 90’s FDIV bug, we need to move on. Our competition is moving much faster than we are".
Vendor, in conversation: "We're pretty sure we can make the next version do cache coherency correctly."
Me (paraphrased): "Don't let the door hit you in the ass on the way out."
Management chain chooses them anyway, I spend the next year chasing down cache-related bugs. Fun.
(I should remark that there are good reasons for this effort. Such as: It boots in under 500ms, it's crazy efficient, doesn't use much RAM, and your company won't let you use anything with a GPL license for reasons that the lawyers are adamant about).
So now you get to find all the places where the vendor documentation, sample code and so forth is wrong, or missing entirely, or telling the truth but about a different SOC. You find the race conditions, the timing problems, the magic tuning parameters that make things like the memory controller and the USB system actually work, the places where the cache system doesn't play well with various DMA controllers, the DMA engines that run wild and stomp memory at random, the I2C interfaces that randomly freeze or corrupt data . . . I could go on.
It's fun, but nothing you learn is very transferrable (with the possible exception of mistrust of people at big silicon houses who slap together SOCs).
There are hardware manufacturers that are better than others at being open and providing documentation. My minimal level of required support and documentation right now is mainline linux support.
Can you document your work publicly, or is there something I can read about it? I'm very interested in alternative kernels beside Linux.
When you buy an SOC, the /contract/ you have with the chip company determines the extent and depth of their responsibility. On the other hand, they do want to sell chips to you, hopefully lots of them, so it's not like they're going to make life difficult.
Some vendors are great at support. They ship you errata without you needing to ask, they are good at fielding questions, they have good quality sample code.
Other vendors will put even large customers on a tier-1 support by default, where your engineers have to deal with crappy filtering and answer inane questions over a period of days before getting any technical engagement. Issues can drag on for months. Sometimes you need to get VPs involved, on both sides, before you can get answers.
The real fun is when you use a vendor that is actively hiding chip bugs and won't admit to issues, even when you have excellent data that exposes them. For bonus points, there are vendors that will rev chips (fixing bugs) without revving chip version identifiers: Half of the chips you have will work, half won't, and you can't tell which are which without putting them into a test setup and running code.
My favorite ARM experience was where memcpy() was broken in an RTOS for "some cases". "some cases" turned out to be when the size of the copy wasn't a multiple of the cache line size. Scary stuff.
As other comments suggest, there might be a third stage, completely forgetting how to design and validate chips properly.
The same reason could have been used to give the NSA some legroom for instance, but tell everyone that's why they won't do so much verification in the future.
Furthermore, I just a read an article (can't find the link) that certain ARM Cortex cores have this same issues as Intel.
More likely "good enough" is much lower because ARM users aren't finding the bugs. The workloads that find these bugs in Intel systems are: heavy compilation, heavy numeric computation, privilege escalation attackers on multi-user systems. Those use cases barely exist on ARM: who's running a compile farm on ARM, or doing scientific computation on an ARM cluster, or offering a public cloud running on ARM?
Overall it’s a depressing story of predictable market failure as well as internal misbehavior at Intel, if true. Few buyers want to pay or wait for correctness until a sufficiently bad bug is sufficiently fresh in human memory. And if you do want to, it’s not as if you’re blessed with many convenient alternatives.
> The AMD microarchitecture does not allow memory references, including speculative references, that access higher privileged data when running in a lesser privileged mode when that access would result in a page fault.
Out-of-order processors generally trigger exceptions when instructions are retired. Because instructions are retired in-order, that allows exceptions and interrupts to be reported in program order, which is what the programmer expects to happen. Furthermore, because memory access is a critical path, the TLB/privilege check is generally started in parallel with the cache/memory access. In such an architecture, it seems like the straightforward thing to do is to let the improper access to kernel memory execute, and then raise the page fault only when the instruction retires.
If it, like it seems, is just an attack on OS kernels and PV hypervisors, you can simply turn off the mitigation, since nowadays kernel security is mostly useless (and Linux is likely full of exploitable bugs anyway, so memory protection doesn't really do that much other that protecting against accidental crashes, which isn't changed by this).
Even if it's an attack against hypervisors any large deployment can simply use reserved machines and it won't have a significant cost.
Well, if I rent a VPS with x performance, I still expect x performance after this flaw is patched. The company providing the virtual machine will perhaps have to pay 30% more to provide me with the same product I've been getting.
Since most VPS offerings arbitrage shared resources, this will not increase costs of providing VPSes by the full performance penalty.
So you may suddenly find that your own performance requirements, that were previously satisfied by "2x m5.xlarge" are no longer being met by that configuration, and I doubt AWS will just provide you with more resources at no additional charge.
Are there any providers that state you will get x performance? Most that I've seen say you will m processors, n memory, and p storage but don't make any guarantees about how well those things will perform.
On the other hand, shrinking Intel's market share due to bad PR and thus adding some competition into the industry could actually foster that progress.
The bigger issue is for things that don't scale easily. That sql server that was at 90% capacity is suddenly unable to handle the load. Sure that could've happened organically, but now it happens (perhaps literally) overnight for everyone all at once.
Expect a bunch of outages in the next few weeks as companies scramble to fix this.
Just wanna point out that a 30% performance hit means a 43% cost increase.
For those confused: the math here is a 30% decrease puts you at 70%. To go from 70% back to 100%, 30% only gets you to 91% (0.70*1.3). 1/0.7 = 1.43 means you need 43% to recover.
It seems you need root or physical access to the system as a prerequisite for the attack.
Where that gets tricky is when everyone's using cloud hosting solutions where the physical machines are abstracted away, and a given physical server may be running multiple virtual servers for different customers.
Think of it like this:
* Somewhere in a data center at a cloud provider is a physical server, wired up in a rack..
* That server runs virtualization software, allowing it to host Virtual Server 1, Virtual Server 2, and Virtual Server 3.
* Virtual Server 1 belongs to Customer A. Virtual Servers 2 and 3 belong to Customer B.
* Normally, Virtual Server 1 can't access any memory allocated to Virtual Servers 2 and 3.
* BUT: Customer A can now use Meltdown to read the entire memory of the physical server. Which includes all the memory space of Virtual Servers 2 and 3, exposing Customer B's data to Customer A.
That's the threat here.
Our ElasticSearch nodes all had 32GB of ram and we had 10 of them and they were all being pushed to the max.
Something like this would be a massive hit, requiring a lot more work into identifying new bottlenecks and scaling up appropriately.
The first "instruction" of your program is the last address on the stack, in the list of addresses you pushed to the stack.
You are executing code, but you did not inject any executable code, you did not need to modify any existing code pages (which are probably read only), you did not need to attempt to execute code out of a data page (which is probably marked non executable).
Address Space Layout Randomization is a way to prevent the "return oriented programming" attack. When a process is launched, the address space is randomly laid out so that the attacker cannot know which address in memory the std C lib printf function will be located at -- in this process.
Now let's think about the kernel. If you could know all of the addresses of important kernel routines, you could potentially execute a "return oriented programming" attack against the kernel with kernel privileges. Without modifying or injecting any kernel level code. These hardware vulnerabilities allow user space code to deduce information about kernel space addresses.
Now that's a lot of hoops to jump through in order to execute an attack. But there are people prepared to expend this and even more effort in order to do so. Well funded and well staffed adversaries who would stop at nothing in order to access more and better pr0n collections.
> If you could know all of the addresses of important kernel routines, you could potentially execute a "return oriented programming" attack against the kernel with kernel privileges. Without modifying or injecting any kernel level code.
The user <-> kernel transition is mediated (on x86-64) with the SYSCALL instruction, which jumps to a location specified by a non-user writable MSR. How does return-oriented programming work in that case?
The crucial point here being that there must already be an existing overflow vulnerability in the kernel. Knowing all the addresses is no use if you can't force execution to go to them.
AIUI, the present circumstances are:
- there exists a Xen security embargo that expires Thursday that might be unrelated
- AWS and Azure have scheduled reboots of many things for maintenance in the next week, which seems unlikely to be unrelated to the Xen embargo
- a feature that appears to be geared toward preventing a side-channel technique of unknown power has been rushed into Linux for Intel-only (both x86_64 and ARM from Intel)
- a similar class of prevention technique has been landed in Windows since November for both Intel and AMD x86_64 chips (no idea about ARM)
- the rush surrounding this, and people being amazingly willing to land fixes that imply a 5-30% performance impact, strongly suggest that unlike almost every major CPU bug in the last decade, you can't fix or even work around this with a microcode update for the affected CPUs, which is _huge_. The AMD TLB bug, the AMD tight loop bug that DFBSD found, even the Intel SGX flaws that made them repeatedly disable SGX on some platforms - all of them could be worked around with BIOS or microcode updates. This, apparently, cannot. (Either that or they're rushing out fixes because there's live exploit code somewhere and they haven't had time to write a microcode fix yet, but O(months) seems like they probably concluded they outright can't, rather than haven't yet.)
- Intel issued a press release saying they planned to announce this next week after more vendors had patched their shit, which lends me more cause to believe that the Xen bug might be the same one 
- Intel claims in the same PR that "many types of computing devices — with many different vendors’ processors" are affected, so I'll be curious to see whether non-Intel platforms fall into the umbrella soon
- macOS implemented partial mitigations in 10.13.2 and apparently has some novel ones coming up in 10.13.3 
- someone reasonably respected claims to have a private PoC of this bug leaking kernel memory 
- ARM64 has KPTI patches that aren't in Linus's tree yet   ( is just a link showing the patches from 4 aren't in Linus's tree as of this writing)
- all the other free operating systems appear to have been left out of the embargoed party (until recently, in FBSD's case), so who knows when they'll have mitigations ready 
- So far, Microsoft appears to have only patched Windows 10, so it's unknown whether they intend to backport fixes to 7 or possibly attempt to use this as another crowbar to get people off of XP 2.0
- Update: Microsoft is pushing an OOB update later today that will auto-apply to Win10 but not be forced to auto-apply on 7 and 8 until Tuesday, so that's nice 
 - https://newsroom.intel.com/news/intel-responds-to-security-r...
 - https://twitter.com/aionescu/status/948609809540046849
 - https://twitter.com/brainsmoke/status/948561799875502080
 - https://patchwork.kernel.org/patch/10095827/
 - https://lists.freebsd.org/pipermail/freebsd-security/2018-Ja...
 - https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/lin...
 - https://www.theverge.com/2018/1/3/16846784/microsoft-process...
Seems that Google/Project Zero felt the need to go ahead and break embargo. Worth adding to the above list of news sources.
If you read the article you quoted:
> We are posting before an originally coordinated disclosure date of January 9, 2018 because of existing public reports and growing speculation in the press and security research community about the issue, which raises the risk of exploitation. The full Project Zero report is forthcoming (update: this has been published; see above).
Just from public Gooogling, I believe it may have been the Register who tried to get in on the scoop, and broke the embargo:
My checking doesn't show any of those three explicitly listed in Apple's security updates up through 10.13.2/2017-002 Sierra.
Also, RUH-ROH. https://twitter.com/brainsmoke/status/948561799875502080
ASLR, PIC (position independent code: chunks of the binary move around between executions), and RELRO (changing the order and epermissions of an ELF binaries headers: a common ROP pattern is to set up a fake stack frame and call a libc function in the ELFs Global offset table) are all mitigations against ROP, but none solve the underlying problem.
The reason ROP exists is that x86-64 use a Von Neumann architecture, which means that the stack necessarily mixes code (return addresses) and data. The only true solution is an architecture that keeps these stacks separate, such as Harvard architecture chips.
As for bypassing the aforementioned mitigations...
ASLR: Only guarantees that the base address changes. Relative offsets are the same. So to be able to call any libc function in a ROP chain, all you need is a copy of the binary (to find the offsets) and to leak any libc function address at runtime. There are a million ways for this data to be leaked, and they are often overlooked in QA. Once you have any libc address, you can use your regular offsets to calculate new addresses.
PIC: haven't yet dealt with it myself, but you can use the above technique to get addresses in any relocated chunk of code, but I think you'll need to leak two addresses to account for ASLR and PIC.
RELRO: This makes the function lookup table in the binary read only, which doesn't stop you from calling any function already called in the binary. Without RELRO, you can call anything in libc.so I think, but with RELRO you can only call functions that have been explicitly invoked. This is still super useful because the libc syscall wrappers like read() and write() are extremely powerful anyway. Full RELRO (as opposed to partial RELRO) makes the procedure linkage table read only as well, which makes things harder still.
If this is the kinda thing that interests you, I heartily recommend ropemporium.com which has a number or ROP challenge binaries of varying difficulty to solve. If you're not sure where to start, I also wrote a write-up for one of the simpler challenges  that is extremely detailed, and should be more than enough to get you started (even if you have me experience reversing or exploiting binaries)
Disclaimer: I'm just some dipshit that thinks this stuff is fun, if I've made a mistake in the above please let me know. I also haven't done any ROP since I wrote the linked article, so im probably forgetting stuff.
Are those kernel logical addresses?
This seems very wrong. I'm not aware of any privilege isolation in Windows relying on the secrecy of any value. Security tokens have opaque handles for which "guessing" makes no sense. Are you aware of anything?
1. Read the root ssh private key from the openssh deamons kernel pages maintaining the crypto context and ssh into the system
2. Read a sudo auth key generated for someone using sudo and then use that to run code as a root user
3. Read the users password's whenever a session manager asks the users to reauth
4. If running in AWS/GCP inside a container/vm meant to run untrusted code, read the cloud provider private keys and get control on account
5. RCE to ROP powered privilege escalation exploit seems reasonable...
6. Rowhammer a known kernel address (since you can now read kernel memory) to flip some bits to give you root
Also remember running JS is basically RCE if you can read outside the browser sandbox, ads just became much more dangerous...
Incidentally, this seems to indicate that zero-copy I/O is actually a security improvement as well, not just a performance improvement?
I am not really sure how/if zero copy may/may not solve this problem.
If this bug only allows reading kernel pages, zero copy may actually help if the unprivileged user can't read your pages, but from the small amount of available description it looks like it can read any page, but kernel pages are more interesting because thats a ring lower and which is why all the focus is on that.
I am fairly certain there is more protection against being able to read memory owned by process on a lower ring level so zero copy may be a bad idea for security critical data.
And based on the disclosure that google published, looks like any memory can be read
I'm cursed when it comes to timing. It's like when I bought that house in 2007, held onto it waiting for the market to recover, then tried to sell it only to find out my tenants had been using it to operate a rabbit-breeding business for years and completely trashed the place (thank you, useless property manager), forcing me to sell it at a loss anyway (6 months ago).
Also, I hate rabbits now. And I veered off topic, sorry.
Well I guess you're not the right person to without about a great ninja-rockstar position at our new RaaS startup.
/one has to joke sometimes to avoid crying over taking a 30% hit in costs... over a stupid CPU bug
sudo cat /dev/mem
I'm having a hard time understanding why this is worse than any other local root escalation bug except for the consequences of the necessary patch.
EDIT: I see that /dev/mem is no longer a window on all of physical RAM in a default secure configuration. Is it true that there's no way for root to read kernel memory in a typical Linux instance? If so, the severity of this issue makes more sense to me.
> I'm having a hard time understanding why this is worse than any other local root escalation bug except for the consequences of the necessary patch.
It's not, as far as I'm aware. The fact that the patch has perf consequences is why it's such a big deal.
a) It would allow any non-root process to read full memory, including the kernel and other processes, or
b) It would allow one cloud VM to read full memory of other cloud VMs on the same physical machine, or
Based on all the hoopla around the linux kernel patches the thinking is : yes it can. Or VM escape. Or both.
>>attack that would be almost expected in a processor with speculative execution unless special measures were taken to prevent it.
if you're going to put in features with expected attacks you should definitely be putting in features to prevent it , and if it is an expected attack it shouldn't be special measures it should just be an inherent part in introducing the feature.
Doesn't multi-user timesharing and virtualization predate every modern CPU and OS though?
At first, computers were very expensive, and so were shared between many users. Mainframes, UNIX, dumb terminals, etc.
Then computers became cheap. Users could each have their own computer, and simply communicate with a server. Each business could have their own servers co-located in a datacenter.
Then virtualization got really good, and suddenly cloud servers became viable. You didn't have to pay for a whole server all the time, and if demand rapidly increased you didn't need to buy lots of new hardware. And if demand decreased you didn't get stuck with tons of useless hardware.
The second stage (dedicated servers) was the case when speculative execution was implemented. We're currently in the third stage, but Intel haven't changed their designs.
Indeed, this reminds me of cache-timing attacks, which probably can be done on every CPU with any cache at all --- and they've never seemed to be much of a big deal either.
The thing is, AMD probably very narrowly just missed this one --- if they did more aggressive speculative execution, they would be the same.
I’m wondering whether ARM chips are affected if they are whether they are uniformly affected or whether it depends on vendor implementation choices.
I'm also wondering if/hoping for a fix that involves increased memory usage instead of the speed.
(I apologise if this is blindingly obvious for somebody well versed in low-level programming.)
Ironically, in human populations it produces the opposite effect.
Edit: since https://news.ycombinator.com/item?id=16063749 makes it clear that you're using HN for that purpose, which is not allowed here, I've banned this account. Would you please not create accounts to break the site guidelines with?
At least we can play our sorrows away.
This sounds like it’s positively evil for outfits that rely heavily on virtualisation also.
There isn't a guarantee that will compensate for that any more than if you updated some piece of your software infrastructure to a new version that just got slower.
And while databases try to minimize the number of syscalls they still end up doing a lot of them for read, writeout, flush.
Intel has already dropped and AMD is up. Maybe there's more to move, but first-order effects are at least partially priced in already.
But what about second-order effects? Seems like virtualization should be vulnerable (VMWare and Citrix), but maybe they actually benefit as customers add more capacity.
Software-defined networking and cloud databases should also suffer though it's unclear how to trade these.
AWS, Google Cloud and Azure might benefit as customers add capacity but there's no way to trade the business units. So what about cloud customers where compute costs are already a large percentage of total revenue?
Netflix should be OK but Snap and Twilio could get squeezed hard. Akamai and Cloudflare might have higher costs they can't pass through to customers.
And where's the upside? Who benefits? If the performance hit causes companies to add capacity, maybe semiconductor and DRAM suppliers like Micron would benefit.
I owned Intel back in 2010 when they bought McAfee for 7.8 billion. They said the future of CPU's and chip tech was embedding security on the chip. the real answer was mobile and gpus.
Not only did I immediately know this was a horrendous deal, it clearly showed that the CEO and management had no clue on their own market's desires and direction. At the time, I was hoping they were going to buy Nvidia, it would have been a larger target to digest at 10 bil, but doable by Intel at the time.
The MacAfee purchase turned out to be one of the worst large cooperate purchases in history. Had they invested the 7.8 billion $ blindly into an sp500 index fund, their investment would be worth ~19-20 billion.
Add on top of that the fact that a lot cloud customers over provision (there's good scientific papers on how much spare CPU capacity there is). Cloud service providers that sell things on a per request / real CPU usage model (vs reserved capacity) prob benefit more.
Also, you can't just separate trading in AWS or GCE from the rest of the core business.
Potentially business units of DELL, HP, IBM, ... should do better as people use this as a justification to upgrade overdue hardware they should cover 5% to 10% lower performance (needing more units to cover that).
But this will require you to have the right kind of flash storage, right kind of fs, right kind mount options, and probably a different code path in userspace for DAX vs traditional storage.
So we're a little ways away from this.
That doesn't move anything from kernel land into userspace, certainly not in the app's process in userspace anyway.
There's no talk in the DAX information about how this results in a zero-syscall filesystem API, and I'm not seeing how that would ever work given there would then be zero protections on anything. You need a handle, and that handle needs security. All of that is done today via syscalls, and DAX isn't changing that interface at all. So where is the API to userspace changing?
This work is experimental but you can mmap a single file on a filesystem on this device using new DAX capabilities. Most access will not longer require a syscall.
This comes with all the usual semantics and trappings of mmap plus some additional caveats as to how the filesystem / DAX / hardware is implemented. Most reads/writes will not require a trip to the kernel using the normal read()/write() syscalls. Additionally, there is no RAM page cache baking this mmap instead the device is mapped directly at a virtual address (like DMA).
Finally, flush for these kinds of devices is at the block level implemented using normal instructions and not fsync. Flush is going to be done using the CLWB instruction. See: https://software.intel.com/en-us/blogs/2016/09/12/deprecate-...
LWN.net has lots of articles and links in their archives from 2016/2017. It's a really good read. Sadly I do not have time to dig more of them up for you. Do a search for site:lwn.net and search for DAX or MAP_DIRECT.
As in, if you call read/write instead of using mmap you're still getting a syscall regardless of if DAX is supported or not. Not everything can use mmap. mmap is not a direct replacement for read/write in all scenarios.
The reply to that comment is accurate: that's a pathological case. Probably an order of magnitude off.
Real-world use cases introduce much more latency from other sources in the first place.
I'm sticking with an expectation in the 2%-5% range.
In the real world, where code is doing real things besides just entering/exiting itself all day, I think it's going to be a stretch to see even a 5% performance impact, let alone 10%.
But overall, yeah.
The reality is that OLTP databases execution time is not dominated by CPU computation but instead of IO time. Most transactions in OLTP systems fetch a handful of tuples. Most time is dedicated to fetching the tuples (and maybe indices) from disk and then sending them over network.
New disk devices lowered the latency significantly while syscall time has barely gotten better.
So in OLTP databases I expect the impact to be closer to 10% to 15%. So up to 3x over the base case.
The first set of numbers isn't actually unrealistic. Doing lots of primary key lookups over low latency links is fairly common.
The "SELECT 1" benchmark obviously was just to show something close to the worst case.
Latency through loopback on my machine takes 0.07ms. Latency to the machine sitting next to me is 5ms.
We're actually (and to think, today I trotted out that joke about what you call a group of nerds--a well, actually) talking multiple orders of magnitude through which kernel traps are being amplified.
Uh, latency in local gigabit net is a LOT lower than 5ms.
> We're actually (and to think, today I trotted out that joke about what you call a group of nerds--a well, actually) talking multiple orders of magnitude through which kernel traps are being amplified.
I've measured it through network as well, and the impact is smaller, but still large if you just increase the number of connections a bit.
--Oracle's marketing tomorrow, probably
(to their credit, SPARC does fully isolate kernel and user memory pages, so they were ahead of the curve here... for all 10 of their users who run anything other than Oracle DB on their systems.)
The most obvious issue with this benchmark is that Phoronix is testing the latest rcs, with all of their changes, against the last stable version [EDIT: I misread or this changed overnight, see below] that doesn't have PTI integrated, instead of just isolating the PTI patchset. The right way to do this would be to use the same kernel version and either cherry-pick the specific patches or trust that the `nopti` boot parameter sufficiently disables the feature. That alone makes the test worthless.
There is no way this causes a universal 30% perf deduction, especially not for workloads that are IO-bound (i.e., most real-world workloads). This is a significant hit for Intel, but it's not going to reduce global compute capacity by 30% overnight.
EDIT: Looking at the Phoronix page, the benchmark actually appears to use 4.15-rc5 as "pre" and 4.15-some-unspecified-git-pull-from-Dec-31-that-isn't-called-rc6 as "post". I thought I had read 4.14.8 there last night, but may not have. Regardless, the point stands -- these are different versions of the kernel and the tests do not reflect the impact of the PTI patchset.
I'm saying that it's not a reliable measurement of the impact of the PTI patchset. There was a PgSQL performance anecdote  (actually tested with the real boot parameters instead of entirely different versions of the kernel) that showed 9% performance decrease posted to LKML, which Linus described as "pretty much in line with expectations". 
Quoting further from that mail:
> Something around 5% performance impact of the isolation is what people are looking at.
> Obviously it depends on just exactly what you do. Some loads will hardly be affected at all, if they just spend all their time in user space. And if you do a lot of small system calls, you might see double-digit slowdowns.
So in general, the hit should be around 5%, and "[y]ou might see double-digit slowdowns" seems like the hit on a worst-case workload is hovering closer to the 10% range than 30%. That's also what the anecdote from LKML shows, unlike Phoronix which shows 25%-30% or worse.
This is more of an attrition thing than a staggering loss. With people saying MS patched this in November, it would be interesting to see if people saw a similar 5-10% degradation in Windows benchmarks since that time.
>How often do companies release performance downgrades of that scale?
I don't know which "company" you're referring to here, but substantial changes in kernel performance characteristics are pretty common during the Linux development/RC process, and yes, definitely some workloads will often see changes +/- 10% between the roughly bi-monthly stable kernel releases.
If you're surprised that Linux development is so "lively", you're not alone. That's one of the selling points of other OSes like FreeBSD.
This is a all hands on deck kind of situation. Apple doesn't usually do well with security firedrills like this.