Hacker News new | past | comments | ask | show | jobs | submit login

Does anybody know if Retpoline will make it to the compiler that the Linux Kernel is compiled with ? It doesn't specifically mention in this paper, so I'm not able to figure out what was actually compiled using Retpoline - userland or the Linux kernel itself ?



Disclosure: I work on Google Cloud.

Patches for both LLVM (the infrastructure behind clang) and gcc are available. You choose what you compile your kernel and applications with, and others are actively looking at retpoline and retpoline-inspired techniques for other code generators (e.g., various JIT compilers). That's why Paul and the folks made this public.


thanks for your reply. in Google Cloud's specific case, what was compiled using Repoline ? Was it the kernel or userland ?

Because you are kind of claiming that the patches that the linux kernel used to fix this (with a 10% drop in performance) are no longer needed. I am kind of wondering if this was submitted to the core linux kernel to be mainlined ? If yes, why was this not used there.

If no, then it means it is being used in some Google Cloud-specific way that is not mainlineable.

EDIT - found this comment which seems to suggest that retpoline is not bulletproof and the kernel's performance-killing patches are still needed

https://github.com/marcan/speculation-bugs/blob/master/READM...

http://lkml.iu.edu/hypermail/linux/kernel/1801.0/03137.html

Does it mean that Google Cloud is doing this only on non-Skylake CPU instances ? It is a very interesting stand - it will mean that it will suddenly be more cost effective to use Google's OLDER machines than the newer Skylakes... because the newer machines have a performance degradation that the older machines will not suffer.


Disclaimer: I'm not Paul :).

As mentioned elsewhere, we recompile everything at Google all the time. I'm not sure which things we've rebuilt with retpoline enabled. As Paul mentions in the article, the point is for software you believe needs to be protected, which may not be everything we build.

That thread on retpoline on Skylake has a lot of confusion. For some folks, they aren't 100% certain it works (it relies on understanding internal details of Intel CPUs) and they argue that IBRS on Skylake is cheap enough so "why not just always use IBRS and not bother?". That's the gist of this comment:

> personally I am comfortable with retpoline on Skylake, but I would like to have IBRS as an opt in for the paranoid.

I want to highlight that Paul and the team have had a lot more time to think about this issue than the folks just joining the discussion. Could our folks be missing something? Sure, and that's the point of public discussion and code review. We hope that over the coming weeks and months it's decided one way or another, but we believe retpoline to be correct and a good optimization (especially for older hardware).


> I'm not sure which things we've rebuilt with retpoline enabled. As Paul mentions in the article, the point is for software you believe needs to be protected, which may not be everything we build.

With all due respect, i'm not sure about the answer you are giving me. I dont see why we are using the word "belief". If there is some secret sauce that Google is unwilling to talk about, then it would be good to have that declared. Because it is extremely weird that you would claim to have a compiler tech that fixes this issue (which needed kernel level fixes otherwise) and not be able to categorically declare what was compiled by Retpoline that gives this benefit.

Possibly its the hypervisor you are using... which explains the zero downtime migration.

P.S. I'm a Google Cloud customer as well.


I meant that only in the “I’m not sure all code at Google has been rebuilt and had retpoline enabled” sense. We said in the article:

> By December, all Google Cloud Platform (GCP) services had protections in place for all known variants of the vulnerability.

which means all the bits of the stacks across all our GCP products (i.e., not just our host kernels and GCE’s hypervisor but also say App Engine sandboxes).

KPTI for kernels plus recompiling any sensitive thing (kernel, hypervisor, etc.) with a retpoline capable compiler. We’re not holding back secrets. Just trying to get the patches out while keeping blog posts relatively parseable by the general public.

Note that no matter how disruptive kernel or other changes are, live migration always makes them “zero downtime migration”. The key here is that using retpoline to protect systems software against Variant 2 results in much lower overhead than other proposed mitigation strategies, particularly on older hardware but even a bit on Skylake.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: