AMD processors are not subject to the types of attacks that the kernel
page table isolation feature protects against. The AMD microarchitecture
does not allow memory references, including speculative references, that
access higher privileged data when running in a lesser privileged mode
when that access would result in a page fault.
Disable page table isolation by default on AMD processors by not setting
the X86_BUG_CPU_INSECURE feature, which controls whether X86_FEATURE_PTI
is set.
I guess Intel decided to speculate data access regardless of privilege level of the target address, with the theory that what has been successfully speculated can't be accessed anyway before the permission are really checked, and somebody found a bug (or given all Intel processors are taggued as unsecure, maybe a quasi-architectural hole) that let read the speculated data or a significant subset or trace of it.
My wild guess is that you can read a good portion (if not all) of the memory (or a significant subset or trace of it) of the whole computer from unprivileged userspace programs.
One possible vector suggested by Matt Tait (pwnallthethings) on Twitter: if speculative operations can influence what the processor does with the cache, the results can be observed with cache timing attacks. If the branch predictor reads the results of speculative operations, it's real easy, as he suggests here:
my guess is that the cache tags or tlb entries loaded on failed speculative accesses are wrong (maybe the valid bit is set but the address wasn't changed, or the user/supervisor protections are munged), that could leave you with a cache line or page tagged as user accessible but really protected kernel data
Perhaps an interaction with conditional branches and cache timing? You could then extract individual bits from privileged memory by having the CPU pull in specific cache lines as part of a speculatively executed branch on the value of the bit.
Could this be possibly fixed with a microcode update in the future (at least for most recent models for which Intel is still bound to some kind of warranty)?
I think probably not, I think it may be too fundamental. But perhaps there is a mitigation if only a few instructions cause the leak and they can disable those in speculation. Performance hit could be bad though, worse than the Linux fix.
Ok so it seems that Intel CPU do some speculative execution on priviledged data from unpriviledged code, including from (at least some and at least part of) separate following instructions. Given the microarchitectural complexity and the already well-known side channel attacks, I would not be surprised at all if someone just finished the work and demonstrated that you can actually read priviledged data with a correct reliability. This might even not be very hard. You can think of prefetching, TLB, cache, hyperthreading, and any combination of those and other features. I'm 90% convinced there is no way Intel managed to close all the side-channels on such a complex architecture, so if they really do continue too much speculative execution I think they (well, actually we :p ) are owned.
Note that under Linux x64 IIRC the whole physical memory is mapped in kernel pages... playing with some adjusting variables, if this theory is correct, I don't see why we could not read all of it. Might be the same under Windows.
I've not checked in depth yet but it could match with all the technical facts we have: very important bug for which the semi-rushed workaround with high perf impact will be backported; affect general purpose OSes but IIRC does not affect some hypervisors (I guess they already do not map at all the pages of other systems while one is running), does not affect AMD (or maybe at least not this way and KPTI can not fix it for them) because of their microarch, involves data leak.
>Intel CPU do some speculative execution on priviledged data from unpriviledged code, including from (at least some and at least part of) separate following instructions.
That was known since pentium 3 times, I wonder why nobody thought of this as a wonderful exploit target before
>I'm 90% convinced there is no way Intel managed to close all the side-channels on such a complex architecture
This is why one should not use verilog and run formal validation of all and everything related to hdl code
What should one use instead? VHDL, Bluespec, Chisel? I don‘t know of any solution that makes verification significantly easier compared to SystemVerilog.
Yes, mathematical verification. Normally, for an any much expensive tape out a partial mathematical verification is done with tooling on hand in addition to brute force simulation.
Ideally, a much closer to complete verification should be done in addition to simulation, like a mathematical proof that register content will never be like set A if inputs are set B
> This basically adds another set of tools to the architectural-level attack toolbox. From reading this I expect we'll see some interesting developments in the future.
And of course there's an obligatory comment beginning with "this cannot possibly work... "
> And of course there's an obligatory comment beginning with "this cannot possibly work... "
its a good point about TLB and VIPT but I don't think this closes the whole class of potential issue; if too-much speculative execution is performed (even just a little bit, even just when you use some obscure instructions) on anything data depend that has been speculatively loaded before privilege checking, I guess its possible to recover the data.
It has nothing to do with AVX AFAICT. They are going to set X86_BUG_CPU_INSECURE [1] on all public Intel CPUs, including old ones that lack AVX (Core2, etc).
[1] I do hope they rename that flag to something like X86_BUG_NEEDS_PTI as the current name is bloody abysmal: way too broad and ambiguous, as if this is the only insecurity that ever has or ever will impact x86.
Well that very name hints that it is probably a very annoying bug, with an impact way more annoying that a simple ASLR address leak or data access trace (hell, Intel does not even consider side-channel data access / tlb tracing to SGX a vuln, so this has to be better than that kind of issues...)
I had a lingering account that cost me 3 cents a month. All of a sudden I get an email freaking out about all these alarm thresholds I'm blowing past. I just outright closed my entire account because trying to delete the offending S3 entities would fail without an error message.
not sure on exact details, but I received one as well, on a free-tier account I had sitting around with an empty dynamodb table that was showing very high projected usage. it was enough that I logged in immediately thinking that account had been hacked.
nope, still empty table, deleted it and went to bed. glad I wasn't the only one who got that.
Amazon sent me an email this morning (Jan 3, 2018) to say I have a bill but I had already cancelled my free account. I can't see what I owe (hopefully zero dollars) because my account is now non-existent.
One tiny Ubuntu instance I barely used and then put on standby. Then terminated.
This may or may not be related, but there is a Xen advisory embargoed until Thursday (see https://xenbits.xen.org/xsa/) and I am aware of at least one VM provider who scheduled emergency VM reboots across their entire fleet this week because the issue cannot be addressed through hot-patching.
I got around 10% of our Xen instances scheduled for reboot on Jan 4th, mostly long running instances. It's the first time I've seen that many instances scheduled at once.
IMHO, with RowHammer, the hardware is broken and it will continue to be broken until users complain enough --- maybe to the point of absolutely refusing to buy --- that the manufacturers and designers stop thinking "works 99.9999999999% of the time" is good enough: https://news.ycombinator.com/item?id=12410274
Why would users refuse to buy hardware that works 99.9999999999% of the time when they apparently have no problem buying software that works 99% of the time?
Radioactive decays and cosmic particles flipping bits give an upper bound for reliability. You are not going to see low-background packages and rad-hard chips in your iPhone.
Radioactive decays and cosmic particles flipping bits give an upper bound for reliability well below 99.9999999999%
If it works 99.9999999999%, then it has a failure rate of 0.0000000001%, or 1E-12. Considering that a modern CPU executes approximately 1E9 operations per second, and that regular HDDs have a worse-case BER of 1 in 1E14 bits, 1E-12 is actually rather horrible and the actual error rate of computer hardware is much better than that.
Imagine if a CPU calculated 1+1=3 every 1E12 instructions. At current clock rates, that's a fraction of an hour. Computers simply would not work if CPUs had such an error rate.
I picked the 1E-12 number arbitrarily, but it's quite illustrative of the reliability computers are expected to have, despite their flaws.
Generally speaking, a piece of JavaScript in some 0x0 pixel iframe in a tab you're not even looking at can't summon cosmic particles to manipulate your computer's main memory. Rowhammer can.
No, but it’s straightforward engineering to prevent many of these problems. Error-correcting codes have been well understood for most of a century. Yes, it costs in performance and in money. For life-safety applications, it shouldn’t be optional.
Amazon is hosting life-safety applications in EC2. Commodity x86 hardware is grossly negligent for that environment.
ECC alone is absolutely insufficient. But ECC can be part of a system design that includes active monitoring and response. I’d expect that system design to also include measurement of ECC events under ordinary conditions, regular re-measurement, and funding for an analysis of the changes and explanation of the difference—just like you’d find in safety engineering in a coal plant, an MRI machine, any sort of engineering that has a professional scientist or engineer on site supervising all operations.
Of course, you’ll also find a tendency there towards specified hardware. They bend or break it to use COTS x86 machines, but—as I think I heard from a comment here last week—nearly nobody ever specified wanting AMT in the initial design, so it’s pretty weird that we’re all buying and deploying it.
Almost everything I've seen on error rates from radioactive decay and cosmic particles has been on servers in data centers.
I wonder if home systems are equally vulnerable, or if there is something about data center system design or facilities that make them more susceptible?
I ask because I had a couple of home desktop Linux boxes once, without ECC RAM, that were running as lightly loaded servers. I ran a background process on both that just allocated a big memory buffer, wrote a pattern into it, and then cycled through it verifying that the pattern was still there.
Based on the error rates I'd seen published, I expect to see a few flipped bits over the year (if I recall correctly) that I ran these, but I didn't catch a single one.
Later, I bought a 2008 Mac Pro for home, and 2009 Mac Pro for work (I didn't like the PC the office supplied), and used both of those to mid 2017. They had ECC memory, and I never saw any report when I checked memory status that they had ever actually had to correct anything.
So...what's the deal here? What do I need to do to see a bit flip from radiocative decay or cosmic rays on my own computer?
I think it's multiplication. The odds are low but the number of potential instances is larger. Data centers have larger numbers of machines and those machines are doing repeated work where you observe the result.
Personal machines are typically limited by what your senses can handle. There are few of them for starters. They idle a lot. If many pieces failed inexplicably it's not likely to be something you are personally paying attention to with your senses.
(I have personally observed ram and disk failures on personal machines anyway. And I have seen stuff in my dmesg indicating hardware faults on my personal desktops, but rarely in a way that I notice in actual use not looking at dmesg.)
> I wonder if home systems are equally vulnerable, or if there is something about data center system design or facilities that make them more susceptible?
I was told once that today's concrete has a much higher background radiation than brick and mortar from before the 50s. There is also more steel in data centres.
However, I'm not at all sure if background radiation of building materials is even in the right order of magnitude to matter here. Probably not.
A bit of a tangent, but somewhat related to your bit about concrete: steel salvaged from ships built before 1945 is less radioactive than modern steel and is useful for devices that are extremely sensitive to radiation: https://en.wikipedia.org/wiki/Low-background_steel
The reason the modern stuff is more radioactive is the massive number of atmospheric nuclear weapons tests conducted starting in 1945. I imagine the concrete has the same issue.
I think you need high energy radiation like cosmic rays from space to create problems. So those at higher elevation are at more risk. Heavy material like concrete may block this radiation.
Alpha particle emissions are common causes of single-bit errors, especially from ceramic enclosure materials in integrated circuits. Mitigating soft errors from circuit packaging is an active area of research in materials science. Parity bits and CRC error checking are needed precisely to reduce the impact of these errors down to manageable levels.
GP is saying the radiation is emitted by the component package itself, i.e. the random decay of particles in the ceramic surrounding an IC can cause errors.
You are not running uniformly random instructions on the CPU. It doesn't matter how many 9s there are in that percentage, if an attacker knows that 0.000...01% code, you have a problem. It actually makes it more insidious, since the chance that is occurs accidentally is basically zero (unlike previous CPU bugs).
The current miniaturization of DRAM circuitry doesn't really allow for a hardware fix for the RowHammer attack. During DRAM manufacturing a test similar to the RowHammer attack exists. This test has certain bounds for passing. If the bounds were tightened up to the level of perfection to prevent the attack it would drop the yield a considerable amount.
If the bounds were tightened up to the level of perfection to prevent the attack it would drop the yield a considerable amount.
The fact that DRAM older than a few years is effectively immune to RH suggests it is possible to manufacture such. Yes, it will cost more, but I think many would be willing to pay for it like they used to, for none other than the assurance of having more reliable memory.
Isn’t the problem related to the capacity of current memory modules? So you can have, say, a 32GB module that’s vulnerable or a 8GB module that isn’t? (Assuming the 8GB module uses lower density DRAM chips)
A new 8GB DIMM module has about 1/4 the silicon area of a new 32GB module and the same density (often 1/4 the number of identical chips); only an old 8GB module, made with an entirely different process, would have larger and less dense safer DRAM cells.
There are many examples of hardware using unreliable underlying layers, countered with whitening/scrambling, redudancy and other data encoding tricks. What would make RowHammer imprevious to these?
Those tricks cost latency. You can get away with hiding some of that latency in access time to persistent storage. It's much harder to do so with RAM.
ECC helps, and can be done at full rate, but isn't a complete solution for all possible problems. And anything you do in hardware at full RAM speed is expensive.
This is far away from "hardware fix is impossible" though. In reputable hardware, hardware vendors are expected to maintain correctness in spite of performance advances.
Also, DDR3/4 DRAM is glacially slow in latency terms, it's far from clear that there would be appreciable slowdown. There are already big latency compromises in the standardized JEDEC protocols that are not inherent in DRAM - it would be very two-faced for DRAM vendors to say they'll only trade off latency over backward compatibility or tiny cost savings, but not over correctness.
The trick I heard was to literally throttle the writes if you see repeated parallel writes or similar -- as row hammer depends on rapid writes that should be sufficient. But I also have no real idea how hardware actually works :) (magnets? how do they work?)
You dont have to overbuild silicon with huge margins, you can build rowhammer detection mechanism and force refresh when it triggers, thats what target row refresh is doing. Problem is Jedec didnt even bother enforcing it, instead making it optional :/
The internal caches have ECC but rowhammer targets the DIMMs. ECC DIMMs would solve the issue but Intel, for market segmentation reasons, fuses off the ability to handle ECC memory in consumer chips.
There is no way the average technology user is going to understand the architectural distinctions of DRAM implementations enough to base their purchasing decisions on that.
I agree, there needs to be an easy but also reliable way to show the problem. The FDIV bug received much public attention because it could be easily reproduced on Windows' built-in calculator.
If this last one can defeat ASLR, imagine leaking bit by bit from a co-hosted VM, to extract secrets of other cloud customers… This is the reason anyone serious about cloud security will reserve instances so that they won't share physical hardware with other customers (think EC2 dedicated instances).
Addendum: I know in most modern CPUs the memory controller is on-die, so my comment is partially wrong (RowHammer is definitely a SoC issue).
Also, if you're interested in this type of things: Armv8.4-A adds a flag … indicating that you want the execution time of instructions to be independent of the data.
Now the primary source seems to have been edited(why?)… But webarchive still has it:
Data Independent Timing
CPU implementations of the Arm Architecture do not have to make guarantees about the length of time instructions take to execute. In particular, the same instructions can take different lengths of time, dependent upon the values that need to be operated on. For example, performing the arithmetic operation ‘1 x 1’ may be quicker than ‘2546483 x 245303’, even though they are both the same instruction (multiply).
This sensitivity to the data being processed can cause issues when developing cryptographic algorithms. Here, you want the routine to execute in the same amount of time no matter what you are processing – so that you don’t inadvertently leak information to an attacker. To help with this, Armv8.4-A adds a flag to the processor state, indicating that you want the execution time of instructions to be independent of the data operated on. This flag does not apply to every instruction (for example loads and stores may still take different amounts of time to execute, depending on the memory being accessed), but it will make development of secure cryptographic routines simpler.
The scope seems limited to ALU, so not really related to the TLB thing we have here. Also, it's still very far away, I'm not sure its predecessor Armv8.3-A is even shipping to customers yet.
I'm confused about the TLB impact. The pythonsweetness link claims these patches now require TLB flushes when crossing the kernel/user boundary, but the description of KAISER @ lwn[1] suggests that these flushes are unnecessary with "more recent" processors supporting PCIDs. How recent is "more recent", and is the PCID support likely to be ported back to earlier kernels along with KPTI?
TLB flushes for syscalls would be absolutely brutal for many performance-critical applications.
If the problem is row-hammer style attacks on the TLB that let you map userspace writable pages into the kernel address space then any kernel entries remaining in the TLB when userspace is running are going to be a security hole. The problem won’t be a process writing to the kernel entry (that would be forbidden by existing code / hardware) but a process updating it’s own TLB entries in ways that corrupt adjacent kernel ones. PCID doesn't help you here - indeed it hurts, because it means there are more TLB entries from the hypervisor or other virtual machines remaining in the TLB to be corrupted!
(Unless I have entirely the wrong end of the stick about this?)
I took OP to mean "rowhammer style" in the sense of a chip operation having unexpected physical effects on nearby transistors; not an attack literally identical to rowhammer.
No, the attack is simply a timing side-channel infoleak attack made possible by the TLB speeding up page fault handling to locate kernel structures mapped into the process' address space.
As far as I know PCID hidden entries in the TLB result in the same page fault as non-existent entries, so the page fault handling becomes constant timed.
I dunno, it sounds like it might be easiest to go ahead and backport PCID along with these patches. It touches a lot of the same code, so trying to split it out might just create more problems.
Yes, and the impact on syscall-intensive workloads is bad enough with PCID, without it it's even worse. I'd be moderately surprised if only KPTI is backported and not PCID.
Usually I go into these things with a fair amount of skepticism but given the linux kernels usual pace of development and the nature of undisclosed bugs we have seen in the past this seems like a large hypervisor bug could be the reality. It must be pretty bad if its the kind of bug that they can't really fix easily, and have to push through an entire new feature into something as old and important as the paging code.
The kernels for Gentoo have been all over the place for the past few weeks. I'm running 4.12 at the moment, then the repos updated to 4.14, which wouldn't build for me, so I waited a week for genkernel to modernize. When I came back 4.14 had been marked unstable and 4.12 was masked, making 4.9 the latest supported kernel. Seems that whatever is happening is a Big Deal.
And, for reasons that are entirely unknown, the issue got worse due to one of the PTI patches (written by, and hence tentatively blamed on, yours truly). Presumably it caused some minor change in code generation causing GCC to go nuts.
FWIW, the compile flag that Gentoo enabled activates a seriously busted GCC feature, and I'm a bit surprised that Gentoo gets away with it in user code.
Is there anywhere to read up on the bustedness of the stack probing feature? (apart from the obvious incompatibility with trying to do that for kernel code).
Probing more than a page size below the current stack pointer is wrong. Probing more than a page size further when one's saved frame area, save area, locals area, and (maximum) calling parameters area do not amount to a page in total is also wrong.
Also, the probe does an unlocked RMW (or 0) instead of a read, which is slower but also wrong in a multithreaded application. The latter is what broke Go.
So yes, It's a terminally broken compiler from hell. I assume gentoo
has applied some completely broken security patch to their compiler,
turning said compiler into complete garbage.
Linus clarified later that he could reproduce on Fedora with the relevant build flag (-fstack-check). His initial assumption that this was caused by an out-of-tree patch specific to Gentoo was incorrect.
The patch to "fix" it is explicitly disabling -fstack-check for the kernel build. I believe that will go out in 4.14.11 (it is not in 4.14.10).
Shouldn't cloud-grade computers be immune to rowhammer (or at least rowhammer should be much less efficient) as they typically use ECC RAM. Switching ECC RAM in a way that also modifies checksum in a deterministic way is (was?) not practical?
ECC doesn't protect you from from all rowhammer problems because they can flip more than two bits at a time, the limit which ECC can detect.
"Tests show that simple ECC solutions, providing single-error correction and double-error detection (SECDED) capabilities, are not able to correct or detect all observed disturbance errors because some of them include more than two flipped bits per memory word"
OTOH, a rowhammer attack on ECC memory will likely flip 1 bit before it flips 2, making attacks theoretically detectable. Without ECC, there's no clear way to detect an attack.
ECC memory controller performs memory scrubbing periodically, in the background, during which it checks parity and corrects any bitflips. Otherwise ECC would not work nearly as well as it does.
AFAIK, row refresh is done within each memory chip, while the ECC bits are normally on a separate chip (for instance, where a non-ECC module has 8 chips, an ECC modules has 9 chips), so ECC scrubbing has to be done in the memory controller.
Yes one would thin so. And the mitigation patches in Linux suggest a CPU bug that could be fixed in future CPUs, not just rowhammer like attacks that are related to memory. So still think the attack path may be different.
The bit flip would be done in a normal fashion - i.e. a command issued that changed the memory location, and hence updates the checksum. The occasional (about one bit flip per Terabyte per hour I think, on average) stray cosmic ray inducing a momentary over-voltage causing the checksum to now disagree would hopefully be within the design's ability to flip back.
It's a hardware level thing. Essentially, when you start rapidly flipping a single bit, that starts to 'leak' some current to the adjacent physical bits. This then allows you to flip a single bit. Especially if you can control bits on both sides of your target.
It's like you are using the bits you can control to 'simulate' an actual stray cosmic ray.
One question I have around this is whether the patches made to the Windows kernel in November exhibit the same performance hits. Does anyone know?
I'm due to refresh my gaming PC, and I was going to go with Intel again as they've not been a problem. However, if Intel chips are going to incur the same 5% - 50% performance hit on Windows, I might end up investing in AMD hardware instead.
I'm certainly no expert, but I would guess that the context-switch time (where I believe the new overhead is added) in games is fairly small compared to the raw number-crunching in-process, so the effect would be minimal in any case
Every draw call will need to transition to kernel space to send data over the PCIe bus to the GPU. Modern games execute something on the order of 1000+ draws per frame, so assuming 60fps that's going to be at least 60,000*2 context switches into the kernel and back per second, more if you're doing high refresh rates.
How big the impact I will be, I don't know - but I wouldn't be surprised if it was a couple percent (effectively ruining the single-threaded performance boost Intel has in gaming over AMD before accounting for overclocking).
Just curious: Even after an attacker goes through all the effort of finding out the physical address of the memory location they want to manipulate, how would someone make sure to get an adjacent memory location to even attempt to execute the Rowhammer attack? And even then, the smallest memory units allocated are basically pages within page frames, right? So if your target memory row is within a physical page frame, does the RowHammer attack even work? (Since there's no adjacent row an attacker has access to then.)
If it’s something that can be triggered from browser JavaScript, maybe it can be attempted N times per second, and there is a mathematical probability of gaining full privileges within a set amount of attempts.
As someone who shares your skepticism of the cloud, I can say that people don’t switch from bare metal hosting (something like SoftLayer) to AWS/GCP for the cost.
If you do the math like “we have 1000 cores and 2048Gb of RAM and 10Tb of RAID’ed SSD” and then plug that in to the GCP calculator... it’s going to be at minimum 1.5-2x your bare metal cost.
That’s not even including bandwidth which is pretty much free at bare metal hosts unless you’re doing a lot of egress.
The calculus changes when you realize that you’re over-provisioned on the bare metal side for a variety of reasons: high availability, “what if”, future growth that’s more medium term than short, etc.
Then you scale back the numbers you’re plugging into the calculator and things are still expensive but now within reason.
Couple that with things like global anycast region aware load balancer, firewalls (an in-line 10GigE highly available firewall costs a lot of money), ability to spin up hundreds of cores in 5 seconds and the value proposition becomes clearer.
It still depends on your work load, but there’s a lot more to consider than just straight up monthly cost.
Totally agree. Cloud makes tons of sense if your workload is really dynamic. Lots of small players are running static workloads though because actually setting up dynamic workloads is pretty complex.
I use GCE for DNS, Storage, CDN (for fronting storage backed files), dynamic workloads that can run on preemptible instances, and scalable instances to serve published static content, but I use dedicated servers for databases, elasticsearch, redis, and application servers fronting those things.
You are right, however I think the amount of users that use cloud server instances because they really need that dynamic scalability is much smaller than the amount of users that use it just because
I have to disagree. If you look purely at hardware cost of bare metal vs. what the same compute costs on cloud then sure, cloud is more expensive.
> It is just as easy to automate [..]
It's really not. As someone who's done provisioning automation at 2 companies, this is hard. Hardware is difficult, every new generation of hardware introduces new challenges in the provisioning and the more hardware configurations you need to support (and different vendors, all kinds of PCI plug-in cards etc), the more likely things go wrong. It takes a full team to build, maintain and debug this. It takes a couple of hours to build a GUI that calls the GCP API's to provision an instance for you, assuming you even need to do this instead of just using the Cloud Console directly. Sure, you pay for it, but now you have 4-10 engineers freed up to do something that provides actual value to your business.
> [..] if you plan well [..]
If. But that's really hard. Capacity planning and forecasting is complicated and the smaller a player you are, the harder it'll be for you to get a decent vendor contract with significant discounts and to be able to adjust and get to hardware quickly outside of your regularly forecasted buy-cycle. On the other hand, it's not your issue in the cloud. You request the resources and as long as you have the quotas, you'll get it (with rare exception).
> [..] and more secure [..]
I severly doubt that. In most cases, though you can host your stuff in certified DC's you'll still be in a colocation facility. Most cloud providers have their own buildings or rent complete buildings at a time. No one else but them has access to those grounds. Aside from that, take a look at what Google for example does on GCP to ensure that their code and only their code can boot systems, how they control, sign and verify every step of the boot process[0]. I've yet to see anyone do that and I doubt most companies that do bare metal have even thought of this or have the knowledge to even execute on this.
Aside from all of this, cloud isn't competing with just providing you compute. VM's (GCE, EC2) is just the onboarding ramp. The value is in all the other managed services they offer that you no longer need to build, maintain, scale and debug (global storage and caching primitives, really clever shit like Spanner or Amazon RDS/Aurora, massively scalable pub/sub and load balancing tiers, autoscaling, the ability to spawn your whole infrastructure or your service on a new continent to serve local customers in a matter of minutes etc). If all you're using cloud providers for is as a compute provisioning layer, then you're doing it wrong.
> It takes a couple of hours to build a GUI that calls the GCP API's to provision an instance for you
Yes, but you will hit all the same problems with different hardware generations, different configs with different limitations, etc. If anything GCE and AWS have more complex offerings than most bare metal hosts. And you have all the same maintenance issues as you run stuff over time and hardware and software updates get released.
> Capacity planning and forecasting is complicated
AWS and GCE certainly don't make it easier. And if you can't capacity plan accurately on cloud and take advantage of spot pricing and auto-scaling then you will be paying 10X price, which describes most smaller players.
> I severly doubt that [bare metal is more secure]
I am saying that shared hosting is fundamentally insecure. No matter what else you do, if you let untrusted people run code on the same server that is a huge risk that assumes many, many layers of hardware and software are bug free.
> cloud isn't competing with just providing you compute
I agree on this. But not all of those services work as well as advertised either.
> Yes, but you will hit all the same problems with different hardware generations, different configs with different limitations, etc. If anything GCE and AWS have more complex offerings than most bare metal hosts.
I haven't hit any issues with hardware generations. At worst what I've had to do is blacklist a GCP zone b/c it misses an instance type I need. In most cases I don't need to care and images that can boot are provided and maintained by the respective cloud provider, so you can build on top of that. I don't need to source or test components together, or spend hours figuring out why this piece of hardware isn't working well with that one. Or why this storage is slower than the other disk with the same specs from a different vendor. I don't need to lift a finger or deal with any hardware diversity issues, I just do an HTTP POST and less than a minute later I have an GCE instance available to me. Though in most cases I don't even do that, I just instruct GKE to schedule containers for me. I also don't need to worry about any hardware renew cycles, deal with failing hardware, racking and expansion of my DCs and what not.
The reason GCP and AWS have more complex offerings is b/c they can afford to provide it. Due to their scale they can shoulder the complexity of letting you chose from a vast array of different hardware configurations, which usually also results in better utilisation for them. Most people can't, which is why bare metal host options are much more constrained. And as a consequence why a lot of resources are wasted b/c it's especially hard to find someone supporting small instance types for just bare metal.
> AWS and GCE certainly don't make it easier.
To me they do. I don't need to deal with the hardware. I don't need to plan buying cycles, account for production cycles and chip releases by manufacturers and factor in how that's going to affect supply, or how an earthquake in Taiwan will make it prohibitively expensive for me to get the disk type I normally want to. I still need to do capacity planning, but I can tolerate much bigger fluctuations in those, and people's usage patterns, in the cloud than I can on bare metal. Unless I want to have hundreds of machines sitting idle, just in case I might need them.
But the best thing is, if I get it wrong in the cloud, I can correct, in a matter of minutes if I want to. Too big instance types? OK, I'll spin up smaller ones, redeploy and tah-dah my bill goes down. Sure you could do that on bare metal, assuming you can even get to a right/small enough instance type, but it's far from this easy in most cases.
> And if you can't capacity plan accurately on cloud and take advantage of spot pricing and auto-scaling then you will be paying 10X price, which describes most smaller players.
But then we're back down to trying to use the cloud just for compute, which is not what you should be doing and not where the value of a cloud offering comes from.
> I am saying that shared hosting is fundamentally insecure.
Though that's definetly true security isn't black or white, it's not secure vs. insecure. Something that you might consider an unacceptable risk (theoretical or practical) might be entirely fine for someone else. There are definetly cases in which this would be of major concern, but for most people it really isn't. Aside from that, as both hardware designs are changing and software mitigations are deployed we're able to achieve stronger and stronger isolation. Eventually, for all intents and purposes, this will be solved.
> if you let untrusted people run code on the same server that is a huge risk that assumes many, many layers of hardware and software are bug free.
This sitll holds true even if you only let your people run code on the same instance (unless you're also only running a single process/app per server?). It becomes a bit more problematic but there's also a lot more research in this area going on than a few years back. We're discovering issues, sure, but we're also getting better and better at mitigating them.
> But not all of those services work as well as advertised either.
True. Every cloud provider could do better. But then, I'd like to see anyone attempt and succeed at what AWS, Google and Microsoft (or smaller shops like Digital Ocean, Rackspace) etc are doing, at their scale and with a staggeringly diverse portfolio of services and high SLAs. All taken care of for you, so you can actually assemble their primitives into useful things for your business, instead of needing to spend months and multiple teams to build the building blocks in the first place (and then also the cost of continued development and maintenance of these capabilities, and of course adding more and more of these capabilities yourself as your organisation's needs evolve).
Though isn’t it only a problem when two VM share the same physical machine? If yes then all you need to do as a client is to never rent a fraction of a physical machines but specs that correspond to a full machine.
> If yes then all you need to do as a client is to never rent a fraction of a physical machines but specs that correspond to a full machine.
Cloud vendors don't guarantee colocation of your resources unless you specifically arrange that. And for that matter, often you specifically don't want co-location, because you want redundancy and migration.
Reverse engineers pretty much know how everything in NT works. Msft publishes enough symbols that it's even possible to automatically decompile much of the code. Something like page table splitting would be obvious.
The source code of the kernel of Windows server 2003 and Windows XP Pro x64 was available to universities for education purposes. Someone leaked the code on internet years ago which is now everywhere on github (search WRK-v1.2). The code doesn't include the ntfs module.
Actually given the horror stories that have been told about the windows source code over the years, I'd be very, very surprised if there wasn't 20 year old code in a lot of places.
Hell there's still 20 year old code running in the Linux kernel
The author of that citation (Alex Ionescu) is known for reverse engineering the NT kernel, and in some cases, implementing features for ReactOS accordingly. He would have been referring to the NT kernel, which he seems to disssemble and comment on with every update.
This might seem like a too easy theory, but if you l9ok 8nto the article posted by jedisct1 there is the somewhat unrestricted access of the L1 Cache and well with multiple mentions of Rowhammer, could it just be rowhammering from whatever the L1 Cache accesses?
So, I googled quickly but couldn't see anything obvious.. this does or doesn't affect Linux running on IBM Z Series mainframes then? I haven't seen much about if Power CPUs are affected by the same flaws
Alternatively, use Firefox (on your mobile) to skip Amp and other sillinesses. Greatly improved my mobile browsing experience, haven't looked back (ublock origin, hint hint).
Firefox is pretty slow on my phone and sometimes when I try to search something, nothing happens. I want to like it, but chrome is just a lot smoother :'(
I do love how Google tries to "downgrade" its experience on Firefox mobile, but all it really does is cut out all the javascript and material design bullshit.
When is the last time you tried Firefox on mobile? Since quantum landed on the Android version, it's become really smooth - comparable to Chrome on my phone (Pixel) at least.
Even though I understood less than 50% of that I am still very excited about reading more about whatever the real issue is. If somebody can pwn aws from a random instance that would be highly amusing to me :D
They weren't really trying to uncover the exploit such that they can reproduce it. They were trying to learn who the exploit affects and what the impact is. I don't think there's anything wrong with that. If you're an AWS customer who depends on hypervisor isolation for critical security guarantees, it helps you to know that this is threatened and perhaps exploitable.
Please don't buy into the idea that embargoes and coordinated disclosure are sacred. They tend to just reinforce existing power structures, sometimes in an unethical (or at least unfair) way.
The CCC stated also that they observed that companies take a more reactive rather than proactive stance regarding their IT security because they believe that they will be notified of vulnerabilities prior of public disclosure or attacks. This may justify not following embargoes and coordinated disclosure.
I'd expect the incentives to be a bit more complicated than that, and I'm also a bit skeptical that either is all that good of a solution. I'd also like to see how exactly "proactive" and "reactive" are being used here, is it about push vs pull for vulnerability notifications, or about hiring their own security researchers, or... ?
Please don't buy into the idea that embargoes and coordinated disclosure are sacred. They tend to just reinforce existing power structures, sometimes in an unethical (or at least unfair) way.
They're an attempt to minimize harm, by getting things patched while minimizing information leaked to blackhats.
Just because giving preference to groups with a better reputation and more market share isn't "fair", doesn't mean it's automatically wrong. Now, if you can show that it actually doesn't help . . .
People are generally willing to make the slightly less safe assumptions that targeted and mass attacks are different threat models, and that coordinated disclosure might not help against the former but does help against the latter.
People are willing to make these assumption because they correspond closer to reality.
Once there is disclosure then 100% of users can make the choice to take appropriate mitigation steps.
Prior to disclosure there will always be the possibility that some users are being exploited without their knowledge.
Therefore disclosure always improves the situation by giving those who could have been exploited without their knowledge the choice to take mitigation steps.
All of the "responsible disclosure" nonsense is just PR by companies who want to avoid the most obvious mitigation, which is for customers to stop using their products.
Depends on the exploit, but all exploits can be mitigated by stopping your use of the exploited product. That is what companies don't want to happen so they would prefer to sacrifice their users' security and wait until they have a fix before the exploit is disclosed.
In this specific scenario what's the mitigation for cloud customers? Or even cloud providers? A customer can't migrate all of their infrastructure before a packaged exploit can be distributed. A provider can't dedicate hardware for every single customer. Let's be realistic here.
This is the classic case of the frog being boiled. Or the pig getting lazy...
At every step along the way, there's been a choice of "Well, we could own the hardware and incur overhead costs, or we could trust someone else and pay our share of lower overhead. It'll mean giving up some control, but it'll save us a few bucks."
Or maybe it goes like "Well, we could develop with practices that result in more robust code, but we'd be slower to market."
There's definitely a sidetrack of "If we crank up the clock too much, all sorts of things get wibbly and we can no longer guarantee that the outputs match the inputs, but we don't actually have ways of doing it correctly at these speeds. The press will slam us if we don't keep pace with Moore's law, how could we launch a product with only marginal speed gains?"
And pretty often I think it sounds like "The ops staff says they're overworked and we need to add people or we risk an incident, but Salesman Bob says we can actually fire most of them if we put our stuff in BobCloud."
At every step along the way, someone made a conscious choice to do the insecure thing. The folks with their eye on security were dismissed as naysayers, and profit was paramount. And because these practices became so common, they became enshrined in market norms and expected overhead costs.
> A provider can't dedicate hardware for every single customer.
A provider absolutely could dedicate hardware for every single customer, that's literally how every provider operated before virtualization. It just wasn't as profitable as virtualization.
The story of the Three Little Pigs was supposed to teach about the importance of robust infrastructure. Nobody should be surprised when the wolf shows up. And every pig had the choice to build with sticks or bricks, it would just take more work or cost more.
And I see your message as saying "Are you serious? Build with something other than straw?! But we already own so much straw! All the pigs have straw houses, won't someone think of the pigs?"
Meanwhile the bankrupt brick vendor's assets have been auctioned off, and the wolves are salivating.
One might migrate some super important instances to bear metal or cloud hardware under full control. If this is an embargo against a bug that could make VMs on the same hardware attackable this should be public.
Get off the cloud. Which is exactly why the companies involved would want to keep it secret.
Not everyone is all in on cloud infrastructure. What about people who are right now deciding whether or not to move critical data to the cloud? Should their security be compromised by hiding the truth about a known exploit in order to "protect" people less concerned about security who already put their data at risk?
How is Netflix going to move off the cloud overnight?
If this vulnerability applies regardless of where it is, what is the mitigation then? Move your machine to where? Your suggestion is not practical and you know it. The right mitigation is one that actually will fix the vulnerability. No one shut down machines and migrated everything to another distro because Debian had a bug in generating private key a decade ago even if the bug was zero-day.
No they don't. They prevent everybody with ill intent who doesn't already know about it from finding out about it from the guys without ill intent. That's all.
The reality is that there's a group you're excluding from consideration and I don't think it's unreasonable to conclude that embargoes keeps every random "hacker" from taking advantage of it in a packaged form. Being vulnerable to the minority population that can and would take advantage of it vs. anyone that would is a valid consideration and shouldn't be dismissed so casually.
would you say it prevents the small fry from having a big impact?
edit: I think the mirai botnet(s) had some sort of power struggle? avoiding propagating knowledge about it prevents it from being easily exploited by more actors.
All of the serious hackers with resources have a paid mole involved in the embargo discussion mailing list. The most dangerous people already know.
Embargo is simply a way to make sure the huge, rich cloud providers don't have their reputation tarnished at the expense of everyone else. "Stay with bigco, we fix things before everyone finds about it"
Exactly. This is just some random nerd. There are people all over the world whose full time job it is to track open source projects to develop exploits.
and when script kiddies get wind that there is something potentially disastrous in the open, it can be exploited 10 times harder, that's all I'm saying.
I understand (and agree) that the system admins/owners should also be able to mitigate through knowledge, but it's a dilemma that I think is better resolved by the other solution
(in this case it's apparently a complex issue, but history has shown that there are surprisingly easy to exploit bugs/issues (see heartbleed, shellshock (which was apparently very quickly exploited.))
No "script kiddies" can write exploit code for something on this level.
This guy is not releasing an exploit implementation, he is just pointing to the existence of a potential exploit that has a patch in development. He can't even figure out exactly what it is.
The only people who would be able to code this exploit would be the ones who already figured it out before this guy.
If you're taking off for vacation and forgot to lock your door, the best thing would be to go back and lock it. If you couldn't get the door locked right away for some reason, you probably wouldn't want the news of your unlocked door broadcast through your neighborhood...
Obscurity actually is a layer of security. The mistake is is when people are dependent upon it.
Your analogy is severely flawed as my door lock is under my control and I know about the risks (i.e it is unlocked) so I can take the steps I need to mitigate that risk
For your analogy to apply here it would be the manufacturer of the door lock having a master key stolen then not telling anyone about it until they have a new lock for you to buy from them, in the case of a lock I would want to know that the lock is useless even if there was not solution so I can mitigate the risk no simply continue locking it believing it to be secure
I see so now we are advocating for the suppression of Speech... Because how else are you going to prevent that? Pretty sure there are 100's of youtube vidoes on lock picking, you want to ban all of them?
But this isn't one persons' door that's unlocked. This is more like a company that shipped a faulty door lock and is trying to keep it secret until they can ship fixed doors to everybody.
But wouldn't you want to know if your door is faulty, so you can either replace your door, delay your vacation until the new door arrives, or beef up your security system?
No ones hiding anything, this patchset was developed in the open for many many months. The hysteria and intrigue in this random tumblr blog is completely superfluous. It's a hardware bug anyway.
Here is a good hint to when something is not being embargoed: there is a paper and a public demonstration.
But no one is hiding the bug this "fixes". If this bug + something else can be a hypervisor escape, well that's too bad for the AWS of this world, but I don't see how this patchset would leak the "something else" we don't know about?
It's not unusual for folks implementing popular OSs to get heads up that some sort of Significant Security Issue is being discussed by other vendors, including receiving advice.
I consider this newsworthy because many very good security engineers in my feed agree it is.
What do you think this fixes? Tiny info leak about kernel addresses? There are still other more reliable ways to get that (even if there is active work to remove those), and I don't believe this would yield to a semi-rushed patch with 5% mean and sometimes 30% performance degradation impact enabled by default, with Linus himself expecting to be backported (rarely to never seen on a change of this importance, and would make no sense given older kernels are even more full of simpler kaddr info leak)
This fixes something bigger than Intel could not fix by microcode update...
No, not AWS. Xen was called safe. All the other KVM providers and normal linux servers in Intel are called out. arm, sparc, s390 were called out as safe, as they provide two seperate translation table registers, only intel provides only one.
I don't think ARM64 works the way you think it does. On s390, there's a register for user-initiated access and a register for kernel-initiated access. On ARM64 (AIUI), there's a register for low (user) addresses and a register for high (kernel) addresses. So kASLR timing leaks on s390 shouldn't happen in the first place unless the TLB tagging itself is rather silly, but ARM64 has no inherent protection.
What ARM64's system does provide is a much simpler way to do a PTI-style pagetable split by twiddling the high address register at entry and exit.
Isn’t Amazon moving off of Xen? Perhaps their involvement is limited to those working on their new KVM-based hypervisor? They also recently dropped their dedicated host requirement for HIPAA customers (to catch up with GCP).
As the article says, rushing through a change this large with a performance penalty just to fix KASLR is very unlikely. KASLR is a fairly weak protection that has been broken many times (particularly on Windows – and yet Windows is still rolling out a similar patch).
Its even worse than that. KASLR is only a mitigation, not an architectural protection (and "breaking KASLR" does not gives you an exploit by itself, you have to find another bug), so there is absolutely no way Linus would consider that just fixing KALSR command the (somehow rushed) inclusion of this patch, moreover enabled by default...
It sounds like Intel, Google and Amazon are hiding something. Wouldn't want customers thinking that cloud computing is fundamentally insecure now would we?
Well Intel for one manufactures the insecure CPUs..
This will be merged for 4.16, when there is no 4.15 release yet. No idea what your cloud computing companies run but it's not 4.15-dirty, and backporting this monster is a great recipe for a nightly emergency when it goes OOPS.
I agree, they're very scary. Also, watch out: a kernel with PTI on will not function in KVM-emulated secure boot mode until a KVM fix gets backported as well.
Some of these patches are already on stable marked kernels. In fact, I'm already running them. There was also an issue where some different default GCC flags hardened gentoo uses was causing the kernel to not be able to boot.
Yep. The kernel maintainers are being so aggressive with this that I'll be very concerned if there isn't a major security issue coming down the pipe. I know that Linux is considered fast and loose in various circles of old-school hackers, and granted, but merging this type of fundamental change into stable is not normal. There's something brewing here that is pushing this to completion.
Sure, with the appropriate baking time. But I don't see a cloud company taking an intermediate version of this patchset, backporting it themselves and then sending it out to all their customers in a hurry.
It is not, but if like it was the case for rowhammer, it was linked to a specific instruction (clflush).
This time it could be an AVX512 instruction (intel only) that leaks kernel address in a way or another.
I was talking from an ISA perspective.
For eg, clflush may be implemented differently between Intel and AMD, it has the same effect on system RAM hence a shared exploit.
> It is not, but if like it was the case for rowhammer, it was linked to a specific instruction (clflush).
No, rowhammer does not need clflush. All rowhammer needs is to be able to write to the same physical memory locations repeatedly. Normally the cache would get in the way, so the attacker needs to bypass it. Flushing the cache (clflush) is one way, but there are others; AFAIK, it has been demonstrated rowhammer from within a Javascript VM, which has no access to clflush.
Yes, I knew you could do rowhammer on arm too where clflush does not exit. So rowhammer is not the correct example.
Yet at some point it was believed that it was necessary on x86 for the attack to work. https://en.wikipedia.org/wiki/Row_hammer#Exploits
I just made a bet that I could guess something out of ISA only.
Going macro to describe what may be the issue. I'm just doing a guess work here.
I was not implying Intel has the same implementation as AMD, nor I was making a case for "this is like rowhammer"
https://lkml.org/lkml/2017/12/27/2
AMD processors are not subject to the types of attacks that the kernel page table isolation feature protects against. The AMD microarchitecture does not allow memory references, including speculative references, that access higher privileged data when running in a lesser privileged mode when that access would result in a page fault.
Disable page table isolation by default on AMD processors by not setting the X86_BUG_CPU_INSECURE feature, which controls whether X86_FEATURE_PTI is set.