The PoC basically tests that if you create a websocket request that results in non-101 response the socket is closed by the server. That is a side effect of the fix. The way the proxy code works is that on presence of the Upgrade header it first hijacks the HTTP request and dials the target. It then starts a copy loop but reads the first bytes into a http.Response. If the response is not 101 it will exit.
The a side effect is that it just so happens to be that because the connection is hijacked the golang server will not reuse the TCP connection and will close it. But according to the HTTP spec that TCP connection is still valid for reuse so the server does not have to actually close the connection (but it happens to). Where this PoC could fall apart is if you are running kube-apiserver behind a particularly pedantic LB (ALB might apply here). Most LBs will not reuse any TCP connection that had an Upgrade header in it, but if they are smart and read the response they can if it didn't result in a 101. From that perspective a load balancer (or reverse proxy) is not obligated to close the TCP connection to the client.
To fully test the vulnerability you need to find a pod you can exec to and another pod on the same node you are not authorized to exec to. First send a malformed pod exec and get the open socket. Then send another request using the kubelet API to exec to a pod you are not authorized for. Then see if you get a valid response (which I believe is a callback URL for the actually tty stream).
The PoC is still basically valid because it could just give you a false negative that you are vulnerable (not a false positive all is swell).
I did briefly consider this when originally throwing together the PoC. It seemed reasonable at the time to test for the behaviour of the kubernetes apiserver as implemented, not how it could theoretically behave if implemented differently. In the ALB/L7 type use case, I'm not sure a reliable PoC can ever exist, because even if you do successfully exploit it, the second request can be load balanced elsewhere.
Also, I don't believe you would necessarily need a separate pod, the PoC could work by using different endpoints, such as exec vs logs, or even better, try and hit a non pod related API endpoint within kubelet. Unfortunately I tried to get the PoC working quickly, and didn't have a chance to revisit this today to see if it can be enhanced.
If you're an operator still in the muck or need a quick way to test, check out our vuln PoC: https://github.com/gravitational/cve-2018-1002105
There's no such thing as an HTTP/2 websocket. Websockets are a purely HTTP/1.1 concept, and they're only HTTP/1.1 until the moment that the `Upgrade` happens, at which point the TCP socket gets hijacked.
I'm confused as to why HTTP/2 is mentioned. I guess they just mean a normal websocket?
Also, my heart goes out to the [big company] engineering manager asking in the GitHub thread if 1.7.x clusters are affected.
I personally found this issue and highly doubt anybody previously exploited it. This bug was discovered from the perspective of a functional bug and only later realized it had a security impact. So it wasn't like somebody was first hacked, and also a security researcher didn't find it either. Basically this issue is very nuanced and I don't think many would be looking for it (although people will now).
Although plenty of people are saying the sky is failing, this is mostly a privileged escalation issue which means you first need valid access to do something harmful. So it's not a fabulous attack vector because hard multitenant clusters are extremely rare. Searching for anonymous auth on kubelets is a better use of your time.
The project's Product Security Team adhered to the timeline indicated in the project's security release process: https://github.com/kubernetes/sig-release/blob/master/securi...
tl;dr a fix is (edit: optionally) sent out to a private distributors list under embargo within 2 weeks of disclosure, and public disclosure (with new releases) happens within 3 weeks of disclosure (with some discretion for timing to make sure it's not buried in a weekend or off-hours)
I can't speak to who knew about it when outside of the project, but I know the project acted expediently once the vulnerability was disclosed.
(Also, aside, props to everyone that got this rolled out so fast at the major providers.)