Hacker News new | past | comments | ask | show | jobs | submit login
Inferring and hijacking VPN-tunneled TCP connections (seclists.org)
320 points by jedisct1 80 days ago | hide | past | web | favorite | 126 comments



Disclaimer: I work at AWS, on Amazon Linux and our VPN products; those aren't impacted by this issue.

The attack that the researchers describe is very impressive, and using traffic analysis and error messages to find the details of an open TCP connection is extremely clever.

Unfortunately a similar approach can be used even more practically to target DNS on the VPN: https://www.openwall.com/lists/oss-security/2019/12/05/3

Encrypted DNS queries and replies can be profiled by traffic analysis, and the reply "paused", making it easier to ensure that a DNS spoofing attempt will succeed. This is a good reminder that cryptographic protections are best done end to end; DNSSEC does not help with this attack, because it does not protect traffic between the stub resolver and the resolver. It's also a good reminder that traffic analysis is still the most effective threat against network encryption.


Hi Colm, were still working on a response to your email, but we appreciate the insight you provided and look forward to our conversation.

This disclosure only deals with the specific threat against active TCP connections, but there are more coming.


An actual high-quality DNSSEC implementation would protect against this. Using an untrusted stub resolver is a mistake; end-user OSes should verify DNSSEC responses directly.


Applications are the real "end" in end to end, and TLS is already e2e. Good TLS does thwart this attack; an attacker still can't generate a valid certificate.

It's taken a lot of focus and attention to make TLS reliable enough to make it a default in browsers, and DNSSEC is not particularly close. DNSSEC supports out-of-date cryptography and has no negotiation mechanisms to avoid them, which makes it very hard to use as a default e2e security protocol. It also doesn't encrypt anything.


Yes, it is correct that you cannot easily hijack connections that use TLS, but you can make inferences about active connections that use TLS. This is still fairly devastating for a large number of vulnerable users that rely on VPNs in nations with authoritarian information controls.


e2e TLS guarantees that you received a DNS answer from the DNS server you requested it from, no one was able to see the answer in transport and no one was able to change the answer in transport.

It doesn't validate that the answer provided was correct according to the owner of the domain your query was regarding. DNSSEC tries to provide this validation. Roughly it is similar to signing a message with PGP.

You can argue that DNSSEC is not up to today's crypto standards and no longer trustworthy for its intended purpose, but without some protocol which provides the guarantees DNSSEC tries too e2e encryption is a 1/2 solution to DNS security.


The parent commenter definitely understands what DNSSEC is.

His point is that DNSSEC doesn't work the way you appear to think it does. Conceptually, it's meant to prove that a DNS record in a response is actually a record created by the owner of the zone. But in practice, the cryptographic signature isn't valuable to stub resolvers on the end system; instead, end systems trust their "DNS servers" (their resolving cache server) to perform DNSSEC validation for them. The success of that validation is conveyed in a single "AD" bit in the header of the DNS response.

Since this attack happens from a vantage point in between the end system and the resolving cache, DNSSEC isn't in play; the attacker will simply set the AD bit in their responses.


What end-user OS does this?


Fedora has a prototype:

https://fedoraproject.org/wiki/Changes/Default_Local_DNS_Res...

I ran it for a while. It was problematic from a UI perspective (nothing to do with DNSSEC per se — it was just immature), and it wasn’t fantastic with captive portals.

Also, it noticed that I visited an internet connection where the ISP was hijacking google.com. So it worked!


So by "actual high quality DNSSEC implementation", you mean one that exists only in prototype form for desktop Linux users.


I’m saying this should be done, not that it is done. The AD bit is garbage.


DoH actually addresses this attack, directly, regardless of whether the affected name is in a signed zone (most queried names aren't!), and it actually works on mainstream operating systems. Why would anyone waste even a moment with DNSSEC?


DoH moves the attack point to a different place - either between the recursive resolver and the authoritative servers, or (with DNSSEC) to the recursive resolver host itself. This is certainly a step up for security against malicious neighbors, but doesn't address malicious infrastructure.


DoH moves the attack we're talking about to a place where the attack can't be carried out.


Modifying DNS results can certainly be carried out ahead of or at the recursive resolver.

DoH without DNSSEC means the results are modifiable by authoritative servers, recursive resolver, and any of the links between them. DNSSEC with a local recursive resolver prevents this. (Of course if you want privacy from your ISP, you then have to tunnel the recursive lookups through VPN(s)).

If DoH is the best one can do, then do DoH. But it's wrong to shout down DNSSEC until an alternative method for authentication comes along.


The same argument applies to DNSSEC: if the DNSSEC authority server is compromised, it can fake records, the same way your DNS recursor can fake records if it's compromised. If you do DoH across the path, at each step, DNSSEC does literally nothing, rather than practically nothing.

Regardless: for the attack that we're discussing on this thread, DNSSEC is not relevant; or rather, this particular attack is a good illustration of how irrelevant DNSSEC is to realistic attack scenarios.


I wasn't aware that DoH was currently being touted for authoritative servers as well. So, comparing the best possible adoptions of each protocol to each other on their unique strengths:

1. DNSSEC (with clients doing full recursive lookups, ofc) is able to have zones signed off of the authoritative server, meaning a compromise of the authoritative server isn't an attack point.

2. With the client performing full recursive lookups themselves, DoH provides partial query privacy.

3. With the client delegating to a trusted third party (eg Mozilla), DoH provides full query privacy modulo that trusted third party (the TTP can break both privacy and integrity)

Comparing (2) to (1) I do see your argument much better now.

However, DNSSEC still has the property of E2E validation that can be used for more than what current software does. One could write a resolver that shipped records laterally between peers to extend its privacy properties beyond either DoH setup. Adopting this wouldn't require all the authoritative servers to get on board, just a community of end-users. This is where my argument is coming from, especially with DoH meaning (3) to most users and the general course of what happens to trusted third parties.


In reality, DNSSEC signers are online, and need to be, because the way you keep DNSSEC from revealing the full contents of your zone is to dynamically inject signed "whitelies" responses to queries. That, and then just the dramatic extra ops burden of keeping offline signing keys, means that the realistic real-world deployment of DNSSEC is, like that of TLS, based on online keys.

DNSSEC does not have E2E validation. DNSSEC is validated between the recursive cache server and the authority server. The protocol explicitly delegates validation away from endpoints with the AD bit. You can run a full resolver on your laptop the same way you can run a full-feed defaultless BGP4 peer on your laptop; it'll "work fine", but that's simply not how it's deployed in reality.

Another dramatic difference between DNSSEC and DoH is that DoH works whether or not zones sign (a tiny portion of the zones people actually make queries on are actually signed). Nobody needs "permission" to protect their queries with DoH, but everyone has to cooperate to make DNSSEC work.

Since the value DNSSEC provides is made more marginal with each passing quarter --- because of MTA-STS, because of multi-perspective CA DNS validation, because of DoH, because of LetsEncrypt making X.509 certificates free, because of certificate transparency --- the rationale for its continued deployment has become extremely thin. It's 1990s cryptography --- queries aren't even encrypted! --- that people are advocating we forklift into the Internet to solve... it's hard to see what problem?

A better plan would be to take DNSSEC back to the drawing board and come up with a modern alternative to it. DNSSEC itself is a failed protocol.


> DNSSEC does not have E2E validation

My comment compared the strengths of each protocol with the fairest interpretation of each. Your judgement here does not do this - it's obvious that a stub resolver relying on a third party to do verification is braindead, and clients doing a full recursive lookup is the correct answer. How clients are currently setup has little bearing on discussion of a protocol's properties.

> Nobody needs "permission" to protect their queries with DoH

This is also false if you compare the protocols on equal footing - if the authoritative servers are not speaking DoH/DoT, then queries are only partially protected. In order to do "DoH across the path" as you said above, cooperation is needed.

> A better plan would be to take DNSSEC back to the drawing board and come up with a modern alternative to it

Sure, but this becomes harder when things like DoH are touted as being a sufficient replacement...


It is not at all obvious that stub resolvers are "braindead" and the "correct answer" is full recursive lookups on the desktop. One way you know this is that no mainstream operating system works this way; another way you know it is that the DNSSEC designers explicitly took stub resolvers into account; yet another is that full recursive lookups eliminates caching, which the DNS depends thoroughly on.

I'm not interested in a debate about a fictitious version of DNS that you make up as the discussion progresses. I think we can probably just wrap up here.


You've written off the whole protocol because of 1990's cryptography. I think it's reasonable to just ignore the specific parts that don't require cooperation to change.

I would be interested in any stats that the DNS system actually "relies" on having clients share caches. Firing out UDP packets is a heck of a lot easier than a TCP/TLS session, and modern websites take the latter for granted for every single user.

If clients sharing a cache is actually important, that's actually a negative point for DoH/DoT as increased resource utilization means that major authoritative servers will be tempted to form a clique with major recursive resolvers, rather than everyone being able to query the zones directly.


The DoH protocol is not designed to run “at each stage of the path.”

Same things like DNScurve do that but somehow nobody got excited about that.


DoT is already being used [https://engineering.fb.com/security/dns-over-tls/]. The value proposition of DoH is to "blend" in the HTTPS traffic.

Not that DNSSEC is useless, but we should worry about tree validation AFTER having encoded every stage of the path.


DoH with client-side DNSSEC validation would be nice. I don’t know whether the protocol supports this cleanly.


DoH doesn’t even attempt to address DNS tree validation, which is what DNSSEC does.

The two are complimentary.


They're complimentary in that one of them does something useful and directly addresses a real ongoing threat, and the other doesn't. It's true, those aren't the same things.


I run my own DNSSEC-validating DNS server for my whole local network (also on the road with DNS over TLS on my phone) using Pi-Hole (DHCP service + blocking ads) and unbound (local recursive DNS resolver that validates DNSSEC). I do DoT using nginx. So basically any Unix can run a resolver and virutally any OS can profit from it !


Why is this supposed to be interesting? You could run defaultless BGP4 on your Linux box if you wanted to badly enough. The question is "what end-user operating system already does this?".


I think some people assumed you were being imprecise in wording your question, because when there's a comment talking about what OSes don't do but should do, "What OS is able to do this?" makes more sense than "What OS already does this by default?".

So "Why is this supposed to be interesting?" is a bit rude for someone that was trying to answer a reasonable interpretation of your question.


You might be right. I really am frustrated by the mindset that says that because a Linux system administrator could get some feature to work, that means it's available to mainstream users; that logic really does suggest that you can do virtually anything on a desktop computer, which is technically correct but kind of negates the whole premise of the question.

But if I came off as personally rude, I apologize and will try harder not to do that.


Remember when Dropbox was first announced? Reminds me of that.


Everyone conveniently forgets that the dropbox comment had some very legitimate points, and the part that gets quoted and mocked was specifically about the benefit to linux users.

It's better not to bring it up.


https://news.ycombinator.com/item?id=9224

That comment there? Never seen it quoted, always in context.

It’s being brought up because it’s very a propos. Someone, somewhere said x problem needs a better solution, and someone else replied that it can be done on one specific Linux distro with a specific kernel version or configuration.

What am I missing here?


> Someone, somewhere said x problem needs a better solution, and someone else replied that it can be done on one specific Linux distro with a specific kernel version or configuration.

Hang on, what is 'it' in this sentence?

Because the dropbox comment was skeptical of the need for a "better solution". In this interpretation, 'it' is an explanation of how to solve the problem the old-fashioned way.

But the comment we're replying to agrees that we need the "better solution" of DNSSEC, and is suggesting a way to deploy the "better solution". In this interpretation, 'it' is the "better solution".

Those two ways of interpreting 'it' are opposites. The two comments are doing very different things.


Opposites are black and white, something and nothing. In conversation, you'll find that analogies are never the exact event they're being compared to, but something parallel enough to evoke a familiar emotion or memory. This isn't math, but you're approaching it as such.

The overlap here is simple. The question "What end-user OS does this?" was answered with...well, gibberish...and tptacek's reply resonated with me and reminded me of the dropbox comment. I think that's about as well as I'll ever be able to explain it. The fact that the two agree that there's a better solution is one facet of the discussion taken out of context, that doesn't even factor into my response or this whole spiel.

What I'd really like to ask though is what your motivation is for mounting such a defense. I seriously doubt it has to do with it having "very legitimate points" or you would've brought them up by now. Also, re-reading that thread, the OP ends up agreeing with everything except that it shouldn't be marketed as a USB replacement.

I completely stand by my decision to reference that comment in jest and will bring it up again!


> In conversation, you'll find that analogies are never the exact event they're being compared to, but something parallel enough to evoke a familiar emotion or memory. This isn't math, but you're approaching it as such.

I'm saying that the comments are barely similar at all. Yes, they both suggest how to do something on linux. That's the only similarity.

> answered with...well, gibberish...and tptacek's reply resonated with me and reminded me of the dropbox comment

But the dropbox comment isn't gibberish...

> What I'd really like to ask though is what your motivation is for mounting such a defense.

Because it annoys me when people misrepresent the comment as a fool who couldn't see the value of Dropbox, too attached to some overly-complex system not applicable to normal users. He clearly did see the value of Dropbox. He said right there that it was "very good" for Windows users. And the mocked point was only one out of three.

> I seriously doubt it has to do with it having "very legitimate points" or you would've brought them up by now.

I didn't bring them up because I thought it was obvious, and it would be a waste of time to list them. But fine, I'll do it.

The post has three points:

The point about cobbling something yourself is a bad point. But it was very strictly limited in scope.

The point about not replacing USB drives is both correct and important.

The point about "not being viral" is agreed to be correct by dhouston, because the viral parts were secret at that time.

So that's two good points out of three.


I'm not an OpenBSD user and I'm not claiming OpenBSD represents any kind of end-user OS, but: my understanding is that the aim of their unwind[1] tool is to do this. And as far as I know, sending DNS UDP/TCP packets is more or less portable to anything with BSD sockets, although I don't know that anyone has tried running it anywhere except OpenBSD. So to the extent you consider Linux, MacOS, or even Windows an end-user OS, and to the extent that the tool is portable and could be configured... eh, there's the pieces of something there.

Anyway, I think DNSSEC is stupid, so I'm not advocating for using this tool or enabling it in OS's as default policy.

[1]: https://github.com/openbsd/src/tree/master/sbin/unwind


None.


I wonder whether this attack also works when the VPN device and VPNed is put in a separate network namespace, which would have its own routing table.


I was thinking the same thing as I read the on the description. I see no reason that a separate VPN namespace would be vulnerable to this attack. The compromised device would be able to spoof packets with whatever IPs it wanted, but they would never be received in a context where the tunnel interface would be directly accessible and therefore the device would never see a response from that address, even if correctly guessed and probed.


This seems like the most popular solution, but there are concerns that it may disrupt routing, and also does not work on -some- systems.


Using DNSCrypt while using a VPN has become a common practice. It should prevent that specific issue.


Off-topic but "service.example.com" woukd have been appropriate to use in the example in your mail. It's RFC-complianr. :-)


> Encrypted DNS queries and replies can be profiled by traffic analysis, and the reply "paused", making it easier to ensure that a DNS spoofing attempt will succeed

"Encrypted" only in the sense of encrypted by the VPN which you've sidestepped. An increasing fraction of DNS traffic from real clients will have been encrypted under DPRIVE, and so sidestepping the VPN doesn't help you spoof that. In this case the connection to a "real" resolver has TLS guarantees about integrity / authenticity and so you can piggyback DNSSEC assurances on top of that if that's how you choose to do things.

By the way the error side channel is very of this moment - something that for example very much concerned developers of protocols like QUIC and TLS 1.3, the reaction reminds me of an anonymous criticism of Unix using a car analogy from the UNIX-HATERS handbook:

> ... If the driver makes a mistake, a giant “?” lights up in the center of the dashboard. “The experienced driver,” says Thompson, “will usually know what’s wrong.”

During TLS 1.3 development more than once contributors expressed a desire for a feature - maybe an optional feature - to get more verbose or detailed errors than those provided already by the protocol in hopes that it would ease debugging. Old hands correctly urged caution, the giant ? is one less weapon for bad guys.

If you send garbage to a WireGuard endpoint as I understand it nothing at all happens. QUIC is a little less tight-lipped, but still endeavours to ensure that a third party can't distinguish anything useful by injecting garbage and looking at what happens next.


The "?" story is about the error handling of the ed line editor. In the event of erroneous input, it just prints "?". The original reason for this, is that Ken Thompson had better things to do than add code for nice error messages – initially the only people using it was himself and immediate colleagues who could just ask for his help in person if they got stuck. Also, the minimalist error handling was valuable in an era of extremely constrained resources (very limited memory, 300 baud modem connections, etc) – a smaller editor could be used to edit larger files, and briefer error messages made editing over slow connections faster. As soon as there was a need for a more user-friendly editor, people built new ones and left ed as it was, and those new editors soon had much better error handling.

To associate its spartan error handling with security is a bit of a retcon. Interesting analogy, but the minimalistic error handling of ed was not motivated by security concerns and had basically zero security benefits (unless you count poor usability as a form of security through obscurity.)


Ah, I have never tried to use ed interactively (I've worked with sed and I mostly edit in vi or its relatives but even the oldest machine I own has a video terminal so I don't need ed) but that makes sense as the source of the '?' error.

I'm aware that the story isn't about security, maybe that didn't come through in my post, it's just that the story always comes to mind when talking about how verbose to make error messages. As you illustrate, there are real trade offs, just today's trade-offs are different than in Ken's PDP programming days.


> If you send garbage to a WireGuard endpoint as I understand it nothing at all happens.

As far as I understand, the idea is to send garbage not to the VPN endpoint, but to any interface on the machine that the VPN runs on, with VPN tunnel's IP there as destination.

The fact that the machine would even consider accepting that leaves me speechless.


You're correct in describing this attack, which is on the TCP/IP stack in various Unix-like operating systems.

I was describing the behaviour of rather newer systems like QUIC, TLS 1.3 and WireGuard which have decided that maybe discretion is the best option.

It seems so far I confused everybody who read what I wrote, (at least everybody who replied) so I apologise for that.


"This attack did not work against any Linux distribution we tested until the release of Ubuntu 19.10, and we noticed that the rp_filter settings were set to “loose” mode. We see that the default settings in sysctl.d/50-default.conf in the systemd repository were changed from “strict” to “loose” mode on November 28, 2018, so distributions using a version of systemd without modified configurations after this date are now vulnerable. Most Linux distributions we tested which use other init systems leave the value as 0, the default for the Linux kernel."

Anybody happen to know why systemd decided this was something they should be messing with?


On Github I found this commit: https://github.com/systemd/systemd/commit/230450d4e4f1f5fc9f...

The explanation in the commit message said this:

------ This switches the RFC3704 Reverse Path filtering from Strict mode to Loose mode. The Strict mode breaks some pretty common and reasonable use cases, such as keeping connections via one default route alive after another one appears (e.g. plugging an Ethernet cable when connected via Wi-Fi).

The strict filter also makes it impossible for NetworkManager to do connectivity check on a newly arriving default route (it starts with a higher metric and is bumped lower if there's connectivity).

Kernel's default is 0 (no filter), but a Loose filter is good enough. The few use cases where a Strict mode could make sense can easily override this.

The distributions that don't care about the client use cases and prefer a strict filter could just ship a custom configuration in /usr/lib/sysctl.d/ to override this. ------

I do not know enough about NetworkManager or sysctl parameters to completely understand if this is a valid reason or not, but this sounds like an acceptable explanation for the change to me.


`git blame` is your friend

sysctl.d: switch net.ipv4.conf.all.rp_filter from 1 to 2

This switches the RFC3704 Reverse Path filtering from Strict mode to Loose mode. The Strict mode breaks some pretty common and reasonable use cases, such as keeping connections via one default route alive after another one appears (e.g. plugging an Ethernet cable when connected via Wi-Fi).

The strict filter also makes it impossible for NetworkManager to do connectivity check on a newly arriving default route (it starts with a higher metric and is bumped lower if there's connectivity).

Kernel's default is 0 (no filter), but a Loose filter is good enough. The few use cases where a Strict mode could make sense can easily override this.

The distributions that don't care about the client use cases and prefer a strict filter could just ship a custom configuration in /usr/lib/sysctl.d/ to override this.

https://github.com/systemd/systemd/commit/230450d4e4f1f5fc9f...

There was NEWS entry for this here: https://github.com/systemd/systemd/blob/230450d4e4f1f5fc9fa4...

as to _why_ they would make the change? Systemd trieds to be project for distros to have a sane base system. Introducing this change was probably something they deemed as being useful in a base system. Especially in the context of laptops and mobile devices this default seems sane.


> Systemd trieds to be project for distros to have a sane base system

Systemd tries to force a monolithic operating system to behave like a microkernel, and uses propaganda and manipulation to enforce whatever preference its designers have for how all systems should run onto all the Linux distributions it can. You can call that 'sane', I'll call it totalitarian empire-building.


It should be noted that setting the rp_filter to either 0 or 1 doesn't mitigate the entire attack. Just parts of it.


Correct. It does completely prevent the attack for IPv4, however, so we didn't notice the attack until the rp_filter settings were changed.

I can verify that turning rp_filter on in my use case is an acceptable solution on Manjaro and Ubuntu.

EDIT: To clear up the confusion below, I was saying that setting the rp_filter variable to strict mode does prevent this attack from working against IPv4, and in my situation, this is enough since I am not using IPv6 or any complicated routing on my network etc.


I don't understand why it would be different for IPv6. are you just referring to the fact that there is no sysctl for reverse path filtering for IPv6? does it still work with ip6tables -t raw -A PREROUTING -m rpfilter --invert -j DROP?


I'm confused!

You write that turning rp_filter to 0 or 1 prevents the attack? While the report says that parts of the attack can't be accomplished.

Does it completely prevent the attack, or only parts of it? This isn't clear when reading this comment, and the initial oss-sec disclosure.

Important as Arch Linux is contemplating possible patching or not.


They edited their comment to clarify that only strict mode ("2") prevents the attack, against IPv4.


2 is loose. 1 is strict.


Mea culpa. 1 it is.


Let me paraphrase: "All the questionable Redhat-backed additions like systemd and Networkmanager aren't great or particularly fit and fixing them so plugging in network cables doesn't disrupt existing connections was too hard. Have downgraded security instead. If you care just re-enable and suffer our broken stack."

GCC 2.96, Pulse Audio, NetworkManager, systemd... I've assumed the death of (truly open) Linux would be at the hands of Redhat, this is just another stab wound.


The issue has nothing to do with systemd. Systemd just happened to enable a sysctl parameter that is needed for multiple network interfaces to work fluently in the first place. This is a problem in Linux kernel networking stack or openvpn.


Sorry if I seem stupid but what are the security implications ?

As far as I understood the attacker can :

- Detect an active VPN connection (and maybe close it/monitor it)

- Attempt to inject packets : for this part I am skeptical of the usefulness. The connection between the server and the client are normally encrypted, meaning that the injected packets will be dropped or can be used to forcefully close the connection making it a DoS attack.


> The connection between the server and the client are normally encrypted

Keyword: Normally. What about DNS? DNS without DNSSEC via an untrusted AP cannot be trusted, that's for certain. Hence, you recommend your user to activate VPN first to avoid such attacks because now it's encrypted. Suddenly, that might change and your trusted-page.internal resolves to a different hostname. This /should/ be preventable via HSTS and certs for example with HTTP, but those are assumptions again. Or what about RDP? It's not encrypted, so you hide it in a VPN - mostly due to the cleartext password. But suddenly there's a vector that might be able to inject data into an RDP stream inside a VPN connection.

> - Detect an active VPN connection (and maybe close it/monitor it)

Not just that. They can detect TCP connections inside the VPN connection. Hence, you can start tracking if users of an access point accept anti-gov.com even if they go through a VPN. It might be detectable on the client side and it should be possible to mitigate this via configuration, but that's still plenty scary.


Or in short, the authentication premise of VPN-routed traffic can be violated by an attacker without knowing the authentication keys, due to some misalignment of (in some cases, valid) routing behavior and VPN integration.


They can inject TCP data which looks — to the application — like it came over the VPN, but it didn't actually.

The vulnerability here is unencrypted TCP streams running (purportedly) over a VPN. Not TLS streams (HTTPS and HTTP/2.0+), unless you've also got a TLS 0-day.

(And maybe also unencrypted UDP sessions and unencrypted services, but that's less clear to me.)


Oh my god. I had to read this twice, but what you said here finally made me realize the severity of this. Thanks.

(I also now understand why the kernel developers might be looking at it as a vulnernablity. Neighbors in your subnet should not be able to inject packets into your computer's internal routes.)


Perhaps this means that the attacker can insert traffic and kernel handles it like it came from the tunnel interface?

ELI5 would be in order indeed.


> This vulnerability works against OpenVPN, WireGuard, and IKEv2/IPSec

> It should be noted, however, that the VPN technology used does not seem to matter

I was wondering which VPNs, but after reading it seems that it doesn’t matter the VPN but is instead a kernel bug? I wonder if this also affects a host that has two network interfaces, with no VPN involved whatsoever?

Also, it’s good to see Jason Donenfeld involved, which many will likely recognize as the author of Wireguard.


It's a routing behavior that is valid for non-VPN interfaces but bogus for VPNs, and it's this intersection of routing behavior and expected VPN behavior where there is a gap. So it's not a particular VPN issue or even a routing bug, exactly. My hypothesis is that routing layers need to allow VPNs to register/reserve interface addresses as "I'm a VPN, drop bogons," and that then VPN software should use these APIs. But I'm no expert in either field, so take that with a huge grain of salt.


Donenfeld has been great and makes a wonderful product. We've been using WireGuard as our primary VPN solution for a long time now and don't anticipate a change anytime soon.

I think this class of attack can best be mitigated at the kernel level, but unfortunately, side-channels exist anywhere that maintains state, and this might be a more general routing problem.


We're working around this CVE in WireGuard's wg-quick(8) with a rule something like:

    iptables -t raw -I PREROUTING ! -i wg0 -d 10.182.12.8 -m addrtype ! --src-type LOCAL -j DROP
where wg0 is the WireGuard interface and 10.182.12.8 is the local IP of the interface. This says to drop all packets that are sent to that IP address that aren't coming from the WireGuard interface. And it's done very early in netfilter, in the "raw" table.

I don't like having to use iptables in wg-quick(8), and so Willy Tarreau and I have been discussing a deeper kernel-level fix, which should be posted to netdev@ sometime soon.



Won't this break the use case of using Wireguard as a gateway to a private subnet, in the typical many-clients-one-server VPN setup?

For example, suppose I have a private physical subnet 10.12.0.0/24 (perhaps in an AWS VPC).

I want to allow clients to access to these private hosts using a Wireguard VPN, so I set up a VPN with all clients having IPs from the 10.34.0.0/24. Because I want these clients to have access to the private physical subnet, so each client's config has

    AllowedIPs = 10.34.0.0/24, 10.12.0.0/24
Which adds both subnets to each client's routing table.

I add a new route for the VPC to send all packets destined for 10.34.0.0/24 to the central Wireguard "server", thus the Wireguard server acts as a gateway between the virtual 10.34.0.0 subnet and the physical 10.12.0.0 network.

The packets originating from the 10.12.0.0/24 hosts are not local, but I definitely want to route them onto the virtual 10.34.0.0/24 network.


I don't think the filter quoted in parent would stop this. In your example what it would stop is clients in 10.12.0.0/24 from connecting to the IP of the wireguard server itself (but not clients it routes to) on the 10.34.0.0/24 network (but not its IP on the 10.12.0.0/24 subnet).


This might be fixable with the kernel's VRF functionality: https://www.kernel.org/doc/Documentation/networking/vrf.txt


I was wondering why wireguard suddenly gained a mandatory iptables dependency...


Yea... not nice at all. And now we'll probably have to simultaneously support nftables. This isn't a route I really want to go down. Suggestions on alternatives are welcome.


Two Ideas:

1. Does encouraging ORCHID addresses reduce the impact of enumeration attacks? 2. Linux at least has controllable behavior for cross-interface IP reachability, in arp_filter/arp_announce/arp_ignore per interface sysctls, and ip address scope, as exposed by iproute / netlink. Perhaps its more proper for VPN addresses to be a scope 'link' address, instead of a scope 'host' address. Maybe a 'vpn' scope of some sort could be defined in future kernels, but I'm uncertain what that would do that a scope link address does not?


How plausible is it to sidestep iptables and inject the rules yourself directly to the kernel's interface? That gets rid of the dependency on a tool which adds friendliness you don't need plus a lot of baggage.

It seems as though (correct me if I'm wrong) the CVE requires an attacker to know or guess the target's IP on the VPN, they'll find out if they're right but they don't get a hotter/colder type feedback. So that opens the possibility to play a randomisation game. On IPv4 this is a very marginal benefit. But if a WireGuard client has been given a random 64-bit suffix for some particular IPv6 subnet then unless I misunderstand the attacker needs to probe all such suffixes until they find the correct one, and they can't realistically do that even on a fast network. If I'm right that's a pretty good mitigation (on IPv6).


Yeah, a random v6 address is not guessable, really.

But for v4? Start your search with private networks (10./8, 192.168./16, etc) and just enumerate /24s from .1, .2, .3 and I expect more often than not you'll do much better than chance.


Could changing route scope from link to host work?


sounds like a perfect candidate for eBPF?


> The access point can then determine the virtual IP of the victim by sending SYN-ACK packets to the victim device across the entire virtual IP space (the default for OpenVPN is 10.8.0.0/24).

So, would a bit of address space randomization mitigate step one for ipv6? fd00::/8 is pretty big, right? Even just picking a random IP in a random /64 from that /8 should help? Or am I missing something?

Also interesting comments in the reply on the list regarding "policy based" vpns. I wonder if some subset of that infrastructure could be used by Wireguard without completing it all the way out of its current nice and secure-by-simplicity design?

https://seclists.org/oss-sec/2019/q4/123

> Only route based VPNs are impacted. In comparison, policy based VPNs are not impacted (On Linux only implementable using XFRM, which is IPsec on Linux specific) unless the XFRM policy's level is set to "use" instead of "required" (default)) because any traffic received that matches a policy (IPsec security policy) and that is not protected is dropped.


> I am reporting a vulnerability that exists on most Linux distros, and other *nix operating systems which allows a network adjacent attacker to determine if another user is connected to a VPN, the virtual IP address they have been assigned by the VPN server, and whether or not there is an active connection to a given website.

I'm not sure that I understand exactly what "network adjacent attacker" means. But I'm guessing that it means an attacker on the same subnet. And that it doesn't involve actually hacking VPN encryption.

But isn't it well understood that sharing LANs with untrusted neighbors is hugely risky? At least, I always segregate critical machines in protected subnets.

Or am I missing something?

Edit: OK, I was missing that this focuses on using VPNs via WiFi APs. And depends on the AP being malicious. So yeah, this is a huge issue, for that use case.


Huh, I wonder how the OpenBSD team will regard this one.

I'm unfamiliar with exactly how to tell what's in the base system and what's in ports, but I can see openvpn* entries over at http://ftp.openbsd.org/pub/OpenBSD/6.6/packages/amd64/ - does that mean there's arguably a hole in the base distro?

In any case, nice work.

Question. Besides randomizing packet lengths (preferably with minimum and maximum tunables (per each packet type) that each site can change, to further add entropy), what else can be done to mitigate against this family of attack?

Asking as someone interested in developing "ubiquitous secure container" type protocols that are application- and use-case-specific but (theoretically) high-stakes.


Best practice for preventing traffic analysis is to send encrypted messages of a constant size, at a constant rate. It's better to pad to a max size, than to randomize the packet length. Sending at a constant rate may not always be feasible or cost-effective, but it can still be useful to send data at a constant rate for a minimum block of time.


It's just easy to implement constant rate, hence why it's best practice. But adaptive rate might be ok, like, for example, if you only pick from 4 different rates and adapt exactly every second, you only leak 2 bits of information per second. The question then becomes which leaking rate is safe enough in practice?


constant max rate tends not to be practical in remote roaming VPNs for cost and contention reasons. It would blast through typical mobile usage rates, for example, or get you throttled on a shared wifi network.


Ya - the NSA calls this "LINK MASKING". Makes a communication channel look busy 100% of the time at at predefined bit rate. Pretty much worthless to traffic analysis, at the cost of burning extra bits on the wire when not actually sending traffic. Mostly only used in Military applications, and specifically where you think an advanced adversary is going to do things like traffic analysis, etc.


> I'm unfamiliar with exactly how to tell what's in the base system and what's in ports, but I can see openvpn* entries over at http://ftp.openbsd.org/pub/OpenBSD/6.6/packages/amd64/ - does that mean there's arguably a hole in the base distro?

Packages are built from the ports tree.


> does that mean there's arguably a hole in the base distro?

No, any packages you see there are, by definition, not part of the base install. They are extra packages you can install later.


Ah, I see. So it's part of the package collection, not the ports tree, but it's not installed by default. Hence the "only two holes in a default install". Heh. Nice.

Thanks.


I just can't understand how this attack can be implemented in real world, it not only needs having an adjacent network access to the VPN client but most importantly knowing the destination of a currently open TCP connection encrypted by the VPN and then guessing the right sequence numbers. Since most TCP connections are also carrying TLS these days, this attack is pretty useless in the vast majority of real world situations I can think of.


> nping --tcp --flags SA --source-ip 192.168.12.1 --dest-ip 10.8.0.8 -- rate 3 -c 3 -e ap0 --dest-mac 08:00:27:9c:53:12

Why is Linux accepting packets coming from one interface into an IP address belonging to a different interface? It feels like it is "forwarding" the packets internally, but `ip_forward` is turned off.

Is there any case where this behavior is legitimately useful?


IP addresses don't "belong" to interfaces in the general case. It's just a hard problem. In fact there are lots of multi-homed use cases where you want to internally route packets across interfaces without an affirmative mapping of what address is supposed to be used where.

For the specific case of point to point VPNs, there's a rule that makes sense. But that's not part of the network stack per se and there's no way to enforce it generically.


Do network stacks drop 127.0/8 packets from external interfaces today? Superficially (I'm not an experienced TCP/IP or routing stack developer, although I do work in the kernel) it seems like the same treatment could be used for VPN-registered interface addresses. You just need an API to specify "I'm a VPN interface" when the device is created or the IP assigned, no?


Is there a place where I can read about these cases?


From the referenced systemd commit,

> such as keeping connections via one default route alive after another one appears (e.g. plugging an Ethernet cable when connected via Wi-Fi).


How is this supposed to work? How will the packet destined to the WiFi IP address get to the Ethernet interface?


The kernel internally routes it to the logical IP address. Since it's an internal address, the packet never goes down to the NIC.


What's the configuration you're talking about? In the Wifi+Ethernet case, how do the routers know to send the packets towards the "right" interface, without the computer having the "right" IP address?

I mean, suppose the computer has WiFi IP address 10.0.0.3 & Ethernet IP address 10.0.0.5, then after NAT the return packets will go to 10.0.0.3, and therefore should go to the WiFi interface, not to the Ethernet interface (or, if they don't, how do they know which interface they should go to?).


> The described attack utilized a malicious router.

I understand how cross-interface packets can be used maliciously. I'm just trying to figure out the non-malicious use cases for them.


Suppose you have a VPN server that routes traffic between several offices. It has tun0 with 192.168.0.1/24 linked to the New York office and tun1 with 192.168.1.1/24 linked to the London office.

The server also runs some service, say ssh, and you have a name for it in the DNS that resolves to one of its IP addresses. When you type "ssh vpn-server.example.com" it should work regardless of whether you're in New York or London, right?

If 192.168.0.42 can reach 192.168.1.42 by routing through the VPN server then it should generally also be able to reach 192.168.1.1 on the VPN server itself.


> how do the routers* know to send the packets towards the "right"*

The described attack utilized a malicious router.

I imagine, in theory, that any middle router (such as your ISP) could then be used for such an attack. Imagine Comcast being able to inject their garbage [0] into even VPN sessions. Or a government actor that Comcast is known to route for.

[0]: https://tools.ietf.org/html/rfc6108


I believe you're interpreting the question wrong.

This isn't "How does the packet get fixed?", it's "How did a packet going to the WiFi IP get transmitted to the Ethernet port in the first place?"


I use this behavior in production systems where I have 'well-known' RFC1918 addresses I use for service bootstrapping/configuration. In the network engineering world, extra loopback interfaces are also used for similar reasons.


It seems for the attacker to find out about active connections to any given website they have to already know the IP of the website and then brute force the virtual ports. Being able to determine what website the target has an active connection to without prior knowledge would probably take even more brute forcing. A small solice.


This is a very noisy attack, for sure.


Correct me if I am wrong, but...

If I just make sure that incomming packets that are destined for the VPN LAN are dropped, this attack does not work?

Of course there are such rules in our firewalls??

Is everyone walking around without any firewall filtering nowadays? How is this a bug? Maybe I am just stupid. Did I miss something?


TCP/IP stack was dropping this by default .. until systemd decided to switch the default https://news.ycombinator.com/item?id=21713479


The default behaviour of the kernel is no rp filtering at all. Older versions of systems enabled strict to filtering, no doubt causing the same sort of complaints from people who like to complain about that sort of thing. Newer versions relaxed this to loose rp filtering for the reasons explained in the commit message.


Disseminating vulnerabilities like these to less savvy people should be a critical imparitive for the next century. I imagine a team of animators at a non-profit or foundation that illustrate how these attacks work at different levels of technical backgrounds.


I'm having trouble understanding why the randomly sprayed bits from the attacker to the client are even accepted at all at the crypto boundary. Shouldn't it hard-fail (not decrypt) due to an invalid key?


The bits aren't VPN-framed packets; the attack isn't VPN specific. They're just sending ordinary TCP setup packets and ordinary TCP stream packets with fudged parameters until they find the right ones by brute force. It's a really noisy attack that uses weaknesses in packet routing to find existing TCP sessions and inject into them.

It's also, like, an escalation from a relatively high privilege level. You need both a passive network observer (compromised router or ISP) and a noisy, active LAN device (to inject IP addresses that a router would filter as bogons). That's not to say this is crazy hard; these are definitely within the reach of a motivated attacker. Routers were the original the-S-in-IoT-is-for-security, and if you've got an IoT type device on the LAN it's probably vulnerable once you've popped the router.


Is this mitigated by iptables filter firewall? For example, FORWARD policy DROP and no other rules?


Yes, if you are doing firewalling properly. The problm is that most people aren't and are just using their distribution defaults.


Someone know if there is some kind of pronouncement by distros implicated?


Can we merge these? I posted this 10 hours earlier here: https://news.ycombinator.com/item?id=21709693


There's really nothing useful there that isn't also here.


"We discovered a vulnerability in Linux, FreeBSD, OpenBSD, MacOS, iOS, and Android..."

What about NetBSD.

This also appears to be wireless only, i.e., the need for attacker to have control over the AP. Am I reading this wrong.


I was reading it wrong. It is not wireless only. Technically, it only requires that the victim is sharing a local network with the attacker.


This doesn't seem nearly as bad as 'hijacking' implies.

> The attacker can now inject arbitrary payloads into the ongoing encrypted connection using the inferred ACK and next sequence number.

First, it's hard to get to this point. Then, you're injecting garbage because you don't know the payload encryption keys. So it's just disruptive, even though yes technically it is 'hijacking'.

Unless I missed something, of course! I found the writeup to be vague on the payload encryption point. It should have explicitly stated the impact one way or the other.


The impact is to unauthenticated TCP connections over encrypted VPNs. The attacker can inject arbitrary content into the TCP session, without knowing the VPN keys. If you are using an authenticating protocol for your cross-VPN traffic, such as TLS, then your connection over the VPN isn't vulnerable.

If you are relying on the VPN encryption to protect unauthenticated communication, then you're SOL. The vulnerability isn't in the VPNs themselves, precisely, but in the way VPNs and packet routing interact.


For your first point, this isn't hard if you're in the appropriate position (adjacent or upstream), as the post details. There are already tools available that can do all of this, example invocations are also there.

For your other point, they don't need to know the keys (unless the traffic travelling /over/ the VPN is /also/ encrypted). That's the whole point. This is about being able to trick the machine into accepting traffic for a connection, from an interface that the connection's traffic isn't travelling over. If it's plaintext within the VPN, this sidesteps the VPN interface and you can indeed inject your own malicious plaintext traffic into that connection.


1. Yes, you have to already be in position. That's limited to specific actors. Once that is achieved, you have to probe for specific IP addresses the victim might be connected to. IOW a list you are targetting. Then the port probing and seqno guessing.

These facts reduce the impact because it's not just "be a guy on the internet", eg like if there were an open database of PII sitting there for the taking, only needing discovery to find it. In no way am I claiming the attack isn't feasible. It's definitely a real risk, beyond the theoretical.

2. Thanks, got it. That makes more sense.

I think I actually like this vuln. It reinforces the need for defense in depth. It reduces a takeover to an annoyance (could be critical for some apps, yes) assuming you use TLS at the app layer.


> That's limited to specific actors.

1) Anyone who can compromise the residential gateway in your home

2) Anyone & anything connected to the same home network as you (incl. any IOT devices; and no, they don't need Internet access)

3) Anyone who can compromise the residential gateway in whatever coffee shop you happen to be in

4) Anyone & anything connected to the same coffee shop network as you

5) ...

... and, like I said, easily scriptable, with the tools already available to carry it out.

It's not a doomsday scenario, but it is pretty bad. The one saving grace is that most apps these days use some kind of application-level authentication and/or encryption, e.g. TLS.




Applications are open for YC Summer 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: