TCP over TCP is a great idea for end user VPNs because 1) networks are generally pretty reliable and 2) NAT is everywhere. I regularly use an SSL VPN over crappy coffee shop wifi and tethered 3G connections. For the most part it works fine.
NAT is an argument against TCP, not for it, because when your NAT'ed IP address changes, TCP connections break. UDP will drift between NATs without skipping a beat.
And I'm very surprised you find crappy coffee shop WIFI and 3G to be pretty reliable - those are exactly the kinds of networks which have occasional sustained periods of high packet loss which wreck havoc with TCP.
Edit: the only downside of UDP VPNs is that stateful firewalls can have extremely short timeouts of UDP "connections" (e.g. 30 seconds!), which necessitates the VPN to constantly send keepalives, which kills battery life on mobile devices. TCP connections tend to be given much longer timeouts.
Completely agree with your righteous rage about TCP over bad connections. I allowed myself a bitter chuckle to the GP's "networks are generally pretty reliable" thing.
Minor disagreement though:
"Edit: the only downside of UDP VPNs is that stateful firewalls can have extremely short timeouts of UDP "connections" (e.g. 30 seconds!)"
Plenty of mobile networks will timeout inactive TCP connections in less than 30 seconds. TCP keepalives are an absolute requirement on long running mobile connections.... e.g. Google maintained GCM connections on Android for notifications... a simple packet capture will show you the frequency of keepalives there... and it's almost always more frequent than per 30 seconds.
> Plenty of mobile networks will timeout inactive TCP connections in less than 30 seconds.
Good grief - that is awful, but sadly believable. Do you happen to know who does that? A couple years ago I tested AT&T's 3G and found that the TCP timeout was 30 minutes, versus 30 seconds for UDP. I'd love to know numbers for other carriers.
Edit: Found an interesting paper from 2011 that tested 73 cellular carriers worldwide and found only 4 with TCP timeouts less than 5 minutes. The majority had timeouts greater than 30 minutes, and 21 had a timeout in the 5-30 minute range. Some of my faith in humanity has been restored. http://www.cs.ucr.edu/~zhiyunq/pub/sigcomm11_netpiculet.pdf (see page 8, table 5)
Timeouts that short violate RFCs. Established TCP connections can't be abandoned unless idle for two hours four minutes (RFC5382 REQ-5) and even UDP timeouts normally have to be at least two minutes (RFC4787 REQ-5).
If people are violating the RFCs then applications that detect it should probably start notifying users exactly why their battery life is suffering.
I see a lot of "should"s. When push comes to shove, though, a notification is not going to change the behavior of the network.
There's a lot of complaints I have about how some networks (especially mobile ones) deviate from RFCs and break specifications. A 30 second timeout is not nearly at the top.
Do you have any idea why the GFW blocks UDP by default? I can imagine that for corporate networks, but as far as I know, uninspectable UDP streams only recently came into existence with Google's QUIC (which assumes pre-negotiated encryption keys are still valid)
Why block all streams if you can inspect them (or at least their handshake)?
While practically no official information exists publicly, this appears to be the reason. My gut tells me that the lack of structure in UDP makes it a little harder to inspect too.
Do you have any information you can share about this? A few years ago, I could reliably use OpenVPN over UDP, as long as I switched ports out frequently. Some time ago (I don't remember when), this ceased to be the case, and I switched to PPTP and, more recently, Shadowsocks.
UDP used to work (~3 years ago) but currently it's blocked wholesale. OpenVPN over TCP gets throttled and blocked thanks to DPI too, an obfuscation layer is required because OpenVPN traffic is identifiable due to a fairly unique encryption fingerprint.
I'm just saying in a previous life I used to spend a ton of time fighting with IPSec NAT traversal issues. With TCP encapsulation (e.g. SSL VPN), you don't have that problem. Most NAT firewalls do a good job dealing with TCP. Other protocols are more questionable.
When I'm using wifi at a coffee shop and start getting a bunch of packet loss, I will switch to a tethered 3G connection. When my SSL VPN reconnects, the VPN server hands me back the same IP address I had before. In some cases, my SSH sessions don't even drop.
IPSec is indeed hell with NATs, and an SSL VPN would be much better. But UDP is even better - most NATs do a good job with UDP too, and if done right, it's possible to switch Internet connections without the VPN having to reconnect.
In my experience, many NATs just drop UDP packets altogether, but still allow TCP through.
Similarly, I find the Internet at my local Starbucks to be some of the fastest, most-reliable networks around; even faster than some local ISPs hooking up to my home.
Note that any NAT that drops UDP packets altogether will basically disable DNS and VOIP type applications or at least degrade the experience in serious ways. I haven't come across many such NATs recently.
Me neither. I have come across networks that filter everything but a few TCP ports (like 80 and 443), but that's a matter of draconian firewall policy rather than a NAT limitation.
Did you read the linked article? (Or did anybody else who is replying?)
It's not about how TCP-over-TCP is somehow aesthetically displeasing or about how people should feel bad about doing it. It's about how TCP-over-TCP is a technically bad idea because stacked TCPs interact poorly. It's never a good idea. TCP-over-TCP is still a profoundly flawed protocol even if you don't happen to tickle its problematic cases.
I read the article. Yes, TCP over TCP include unnecessary performance-harming features in certain circumstances, because you have two unrelated collision avoidance systems running at the same time. Even so, in many scenarios TCP over TCP is an excellent idea: it can provide you with many benefits, in practical terms works great, and has no readily-available alternative which is better.
It doesn't solve the problem as theoretically neatly as possible, and carries some cruft. But hypothetical me at a starbucks about to open an ssh tunnel to a trusted connection, hypothetical you telling me that that's never a good idea. Okay then, what should I do instead?
So sitting at starbucks with my laptop -- what do i do? I am not aware of an option to have SSH run over UDP, although I do know that some VPNs allow you to use UDP instead of TCP.
Unless there is a relatively simple way of getting an encrypted tunnel for my HTTP traffic using tools like ssh and netcat and other things I'm likely to already have installed, I disagree with the notion that it's never a good idea.
When you are running a SOCKS proxy through ssh, you are not doing TCP over TCP. We are talking about things like OpenVPN which can do TCP over TCP, but that is generally a bad idea. It's default mode is TCP over UDP, as it should be.
Only for apps that don't try to utilize maximum throughput. Skype, YouTube, most web browsing, most mail use.
But those that do - like large file transfers over ftp/sftp or a very large email, for example - will cause the meltdown described in this article.
There are some TCP stacks that use RTT rather than packet loss as their congestion metric; Those fair well under a TCP-over-TCP regime (but have other problems)
UDP (and datagrams in general) are not the only alternative to TCP-over-TCP. My sshuttle VPN uses TCP but avoids the TCP-over-TCP problem. https://github.com/apenwarr/sshuttle
TCP congestion control fundamentally depends on packet loss to know when to slow down. If the outer TCP makes sure packet loss doesn't happen - because if it does, it retransmits - then the inner TCP won't know what's going on, and will send as fast as it can, creating a mess.
The trick with sshuttle is that you terminate the TCP sessions at the server, and just send the raw data over the multiplexed link; there are no inner TCP headers anymore. Then you add them back at the other end by reconstructing a new TCP session. This eliminates the second layer of TCP congestion control inside the tunnel.
1) a TCP packet from source IP S comes in on side A of your tunnel
2) instead of acknowledging the packet, side A only sends it as data to side B other TCP (ssh)
3) the data may get lost, in which case, the TCP connection between A and B retransmits
4) side B gets the data, forwards the packet to the destination IP D
5) D acknowledges, sends a packet to S
S --------- A ================= B ------------ D
When there is a lot of packet loss at step 3, the delay before S getting the acknowledgement sent at step 5 increases and S sees the congestion. Unlike TCP-over-TCP where A acknowledges packets from S as soon as it gets them.
> Because the timeout is still less than the lower layer timeout, the upper layer will queue up more retransmissions faster than the lower layer can process them. This makes the upper layer connection stall very quickly and every retransmission just adds to the problem - an internal meltdown effect.
Intuitively, this is obvious at an organizational level. If your boss is constantly micro-managing each piece of work (segment), and his (or her) boss is doing the same, you are going to have a meltdown.
TCP does what it's designed for: Reliable, ordered, stream connection with fixed end-points. UDP is the other extreme of this permutation. Wonder why no one explores the spectrum in between?
All of those properties are binary things -- it's either reliable or not reliable, ordered or not ordered, etc. The notion of mostly ordered or mostly reliable we sort of get "for free" with UDP. So then your question becomes: what about some properties and not others? There are ways around the fixedness of endpoints so I'll just look at reliability and ordered-ness.
Unreliable but ordered: use UDP, enumerate your packets, and if you get a packet out of order, discard it. No cheaper than UDP, and I can't really see a potential benefit over it.
Reliable but unordered: for this to make sense you have to impose a timeout (i.e. if you're willing to wait forever, UDP is reliable in that you can never be sure you won't ever receive that packet). So now you have ACKs and NACKs and you essentially have TCP minus congestion control and where you don't bother to re-order packets based on seq #. I can't really see the benefit of this either.
That said, there are many non-tcp-non-udp protocols out there. I just wouldn't say that the protocol-space is a spectrum with TCP on one side and UDP on the other -- there are many, many dimensions to look at.
Perhaps I'm missing something obvious, but isn't this trivial to solve by just having a deduping filter at the lower level? When the higher level starts keeps sending duplicate packets, just ignore them. Duplicate packets at this layer will always be useless, because your lower layer is already implementing reliability semantics. Then, when you get ACKs from the other side, translate those packets to match the sequence numbers that the higher level is expecting (i.e. the most recent sequence number that was sent out corresponding to a packet with those contents).