
TCP Sucks - reacocard
http://bramcohen.com/2012/05/07/tcp-sucks
======
jandrewrogers
Designing networking protocols that are robust in mathematical sense is
unbelievably difficult. In fact, we humans have only found optimal solutions
in a few cases if you dig through the mathematics literature. Many real-world
networking protocol design scenarios do not have a known non-pathological
implementation. Furthermore, there is a large number of decentralized protocol
designs that we can prove to have many poor qualities. To bludgeon the equine,
people that can significantly advance our understanding of such things tend to
win Nobel prizes and similar. It is that difficult.

That said, TCP is not the best we can design given everything we know about
designing network protocols. It was good enough for the people that designed
it at the time, and possibly (my chronology is fuzzy) was approximately as
good as the mathematics would have reasonably allowed when it was developed.
We can make it work well enough in many cases -- the economics of inertia.
Other narrow use cases are better solved differently but are not general
solutions.

It is one of those problems that sounds like it should be easy to solve on the
surface but turns into a bloody epic challenge once you start to dig into it.
I am not offering a solution, just noting that very few people can.

~~~
its_so_on
To people who don't know what the parent is talking about.

Take a simple example.

<http://en.wikipedia.org/wiki/Two_Generals_Problem>

My summary:

 _The two generals problem proves that, if there exists any nonzero
probability of packet loss, two people cannot even coordinate to both have a
state 1 at sunrise tomorrow (attack!!!) or both have the state 0 if it is not
100% mathematically guaranteed that they both believe this has been
coordinated (since an uncoordinated attack will be a catastrophic loss for
them).

(In other words, the guarantee must be such that by sunrise tomorrow state 0-1
or 1-0, in other words one general thinking the attack has been coordinated
with certitude and attacks, but the other general thinking the attack has not
been confirmed with certitude and does not attack, must be a mathematical
impossibility.)

Take a simple approach. The following packets are all encrypted, but any or
all may be lost.

1) first general sends: "Let's both be in state 1 tomorrow (coordinated
attack). Since an uncoordinated attack is so catastrophic to us, I will only
enter state 1 if I receive your reply. Please include the random number
25984357892 in your reply. As soon as I get this the attack is ON. If I don't
get such a packet within the hour I will assume this post was intercepted
(lost), and I will send another. I will remain in state 0 until I receive that
packet."

2) second general sends: "Got your packet with 25984357892. This is my
acknowledgment! I will attack as well. In case you don't get this, I know you
won't attack thinking I didn't get your message, so I am sending this message
continuously."

Great. But what if all messages from the second to the first are intercepted.
Now the first thinks all of HIS were intercepted (has received no acks) and
doesn't attack, but the second one does. Failure.

So, we have to emend 2) to:

2) second general sends: "Got your packet with 25984357892. This is my
acknowledgment! I will attack as well. In case you don't get this, I know you
won't attack thinking I didn't get your message, so I am sending this message
continuously. In case you don't get any of THESE messages, however, I will not
attack. Therefore acknowledge ANY of them with random number 458972984323..."

Ooops. What if all the first general's ack's of the acks are intercepted or
lost? (Perhaps the first general is able to send messages until receiving (2),
but just as the first general gets 2) conditions change and the general no
longer has any of his messages delivered.)

Now the first general thinks he has acknowledged the ack, but the second
general doesn't even know if his ack-(cum-request-for-an-ack-back) message was
even delivered..._

and so it goes...

Of course, in practice you can simply say: "Let's do this for a certain number
of acks of acks of acks, 3 let's say, and then just keep sending the same ack
to each other, assuming that if the connection was reliable enough to get to
three deep, then it will be reliable enough for one of the final acks to make
it through." That's a false assumption (mathematically - what guarantee do you
have that if 3 of your encrypted messages made it across, at least one of the
next 217 that you send by sunrise all with the same message will), but a
reasonable one.

So it is not a practical problem. This is a mathematical problem. Although you
cannot even do something as simple as "let's agree to both be in state 1 (or
neither if we fail to agree), OK?" over a less than guaranteed reliable
connection, if the connection has any reliability at all you can get to within
a practical level.

once you reliaze that, PROVABLY, you can't even do the most mundane things no
matter what, the mathematics the parent is talking about do not seem all that
interesting anymore. :)

~~~
paulsutter
While it's true that network protocols are neat challenging puzzles with
nontrivial solutions, the hardest parts end up being the mundane: how does any
given protocol change interoperate with all the existing implementations out
there, especially changes to congestion control, and across the range of
optional protocol features.

Uncertainties introduced by packet loss are actually pretty easy to work past.

------
viraptor
Ok, maybe I'm missing something, but reading the article I see some weird
ideas:

RED is hard to deploy, so let's change the base protocol instead. - how does
that make sense? Everyone would have to start using new libraries and for
backward compatibility we'd have to preserve the tcp layer too. That means
standards like http would have to get extensions to use SRV records or suffer
delays while utp availability is probed.

There's also a complaint that RED will drop packets once the queue is full. I
don't get that at all - it will always happen...

In addition I get an impression there is some tension/implied superiority
between us (people doing uTP) and them (ones doing RED). Why does it look so
ugly? There's a known problem, there's an interesting solution for new
software (uTP) and some plan to migrate old protocols transparently (RED).
When did that turn into some bizarre conflict and why?

~~~
bramcohen
BitTorrent is using uTP just fine, which is only, you know, most of the upload
from consumer internet connections, and we're working on getting the same
things crammed into TCP with LEDBAT, but that's a slow process.

I wasn't complaining about RED dropping packets, just describing how it works.

As for the tension, my point is that my solution works and the other one
doesn't. If you want to know why the person I quoted was being such a
dismissive jerk, you'll have to ask him.

~~~
viraptor
Ok, I don't understand this part then: "With RED the router will instead have
some probability of dropping the packet based on the size of the queue, going
up to 100% if the queue is full." - that seems to be universally true for any
queue with limited capacity. If it's full, it's going to start dropping
packets - whether it's those on the queue, or the new ones doesn't matter. Any
queue which is full will require dropping as many packets as the number coming
in.

Is there some reason this was described as a specific behaviour here?

~~~
stock_toaster
I believe the focus is more on the first part of that statement. A "standard"
queue would only drop packets when it is in actuality full. RED drops packets
when the queue is non-full based on some calculated probability. The
likelihood of drop simply goes up until the queue is full (and 100% of new
packets drop).

Apparently.

------
caf

      The solution is for the end user to intervene, and tell all
      their applications to not be such pigs, and use uTP instead
      of TCP. Then they’ll have the same transfer rates they
      started out with, plus have low latency when browsing the
      web and teleconferencing, and not screw up their ISP when
      they’re doing bulk data transfers. 
    

That still doesn't address the problem when you have many users behind the
same queue, some of whom care only about throughput and not latency. You need
a scheme which will work when all of those users are acting selfishly.

~~~
scott_s
My thought was more fundamental than that: any solution which involves asking
users to request different transport protocols is not going to solve the
problem. There are far more users who have no idea what a "transport protocol"
is than those who do.

With that said, I enjoyed the post. It's an interesting problem, and I do find
the base idea attractive: allowing applications to opt to be background
traffic.

~~~
nkohari
It wouldn't (at least in my understanding) be the user that would choose, it
would be the application. WoW for example would optimize for latency, whereas
BitTorrent would optimize for throughput.

~~~
scott_s
Correct, but Bram's argument (as I understand it) was that the users would put
pressure on the application developers to opt to be background traffic.

~~~
nkohari
Two nearly-universal truths about users that suggest solutions: they're both
rare, and usually wrong. :)

It's more likely that users will state the problem -- such as, "I want to be
able to run WoW and BitTorrent at the same time." From there, the software
developers would determine the solution (optimization for latency vs.
throughput).

------
stephenbez
If anyone is curious what uTP is, you can find the protocol defined here:
<http://bittorrent.org/beps/bep_0029.html>

------
volatile
The author seems to claim that is is implausible for a router vendor to sell a
router that drops more packets.

    
    
      The marketing plan is that the because router
      vendors are unwilling to say ‘has less memory!’ as a
      marketing tactic, maybe they’d be willing to say
      ‘drops more packets!’ instead. That seems implausible.
    

Yet he concludes by suggesting the router should drop all the packets.

    
    
      The best way to solve that is for a router to notice
      when the queue has too much data in it for too long,
      and respond by summarily dropping all data in the
      queue. /snip/ Of course, I’ve never seen that proposed 
      anywhere…
    

Based on his earlier reasoning, that would also be implausible.

~~~
bramcohen
That's what you would do IF you were going to be serious about making the
router drop packets in a way which actually helps. I don't expect it to happen
any time soon.

------
nuje
It seems to me that on the IP level the net has been in a technological
paralysis for some time.

We can't get RED or IPv6 deployed, and and the IETF doesn't seem to get
anything useful happen these days.

edit: anyone else remember when layer 3 had a bright future ahead of it, IPv6
and end-to-end IPSec (with keys in the DNS) were just around the corner...

~~~
bramcohen
uTP carries the bulk of all BitTorrent transfers at this point. This would
seem to imply a certain level of success.

------
endymi0n
Having uTP running against UTP as an alternative network connection means is a
rather unfortunate naming - I imagine a lot of people getting confused to the
max, especially as it's pronounced the same. So pretty pretty please: Give the
protocol a GOOD name first, then we're talking business! ;-)

------
josefonseca
I think the catchy title was meant to grab attention to an important present
day issue.

But TCP actually does not suck, it's been there for longer than I have and
served us pretty darned well up until now.

Never forget that when the TCP protocol was designed, the biggest concern we
had was that a nuke would land on top of our heads at any minute and the
network should keep working. Also, the "Internet" was thought to be a small
niche network of networks among the military and academics.

I guess this is all well known, it's just my reaction to the editorialized
title.

------
msbhvn
Just to be clear, "TCP Sucks", despite successfully running the majority of
the global Internet traffic. "TCP Sucks" so bad we're basically going to copy
a lot of it: window based congestion control, SACK, timestamps, ability to add
new options, etc. "TCP Sucks" because it is not perfect and has an issue, an
issue that requires router / switch upgrades. We're going to fix that by
breaking backward compatibility with _tons_ of applications and requiring an
OS update on _every_ client and/or application. All this assuming our
relatively new and unproven thing is as good as TCP in all other ways and
fixes this issue of TCP perfectly.

Hmmm. Me thinks that TCP does not suck so much.

~~~
bramcohen
Did you read the article or just the headline?

~~~
msbhvn
I did read it, guess I just got too caught up in the title to brush it off
(looks like some others here also wondered about that).

Anyway, uTP looks cool, LEDBAT sounds very interesting and BitTorrent is of
course, completely awesome. I just don't think TCP sucks. I'm constantly
surprised at how well it works for how simple most of it is and how
complicated and intractable the rest is.

------
hristov
Shameless plug:

Extremetcp.com is the solution to the congestion problems of TCP. The best
part of ExtremeTCP is that it is not a new protocol. It is TCP. It just uses
clever algorithms at the sender side to send data while avoiding congestion.
(Since TCP does not actually specify which algorithms one should use as long
as one avoids congestion, ExtremeTCP is a perfectly legal version of TCP).

Yes, I am involved with this. If interested in testing, please send an email
to the contact address in the website.

~~~
sams99
there is also <http://www.fastsoft.com/home/> which professes to do the same,
personally I worry about any non-documented non-public congestion control
protocols, many years have been spent in the academia researching this subject
... it is easy to be "fastest" - just disable congestion control altogether -
trouble is tons will break. In order for me to use a different congestion
control algorithm in production I would need some experts to review the
protocol to ensure I am being a good web citizen and not breaking the
internet.

~~~
hristov
That is the initial reaction of most people. But it is not the case actually.
If you disable congestion control, you will be the fastest for a little bit
(maybe a hundred miliseconds) and then you will get drowned in dropped
packets. In other words, you will be sending data fast but packets will start
dropping on the way, so the data will not be received fast.

When we say we are the fastest, I mean we are the fastest at sending packets
all the way through. This is not easy at all, and it is not as simple as
sending packets as fast as possible.

We are very good at modulating our speed in order to have full speed while
avoiding dropped packets. Thus, in many situations, we send data slower than
other TCP versions but the other versions get into trouble and start dropping
packets.

There is one universal rule for TCP congestion avoidance algorithms and that
is that as soon as you notice a dropped packet, you have to stop and wait for
the congestion to clear up. If you do not do that, you will break the
internet. But we do follow that rule; furthermore, we avoid large numbers of
dropped packets in the first place.

We have tested our software with other standard connections and it does play
well with others.

As someone else noted, Fastsoft are respected by the industry and it is well
established that they do not break the internet. We are about 30% faster than
fastsoft.

------
dfc
Is the shout-out the mere mentioning of bittorrent? Or does nick weaver elude
to bram some other way in the full article?

~~~
bramcohen
Who do you think made the development he's referring to happen?

~~~
dfc
I obviously know who is behind bittorrent that is why I mentioned it as a
possible alternative. But you yourself state that Stanislav Shulanov is the
man behind utp. More importantly shout-outs traditionally explicitly reference
the person/thing in question. I was not sure if there was a
longer/muddier/more-controversial back story that would not be appropriate to
discuss in an acm article.

------
thespin
I'm surprised he doesn't mention CurveCP. He's taken ideas from that author
before (e.g. netstrings, which you'll find in the .torrent file format).

TCP does suck. If you try to use it for lots of short lived connections. And
that pretty much sums up how it's being used nowadays, most the time.

For single, long term connections, TCP is fine.

~~~
bramcohen
uTP isn't based on CurveCP, and CurveCP is nowhere near as mature as uTP is.

~~~
jlouis
That said, CurveCP is a quite interesting protocol! It just needs more support
and testing.

------
shaggyfrog
> Of course, I’ve always used TCP using exactly the API it provides, and even
> before I understood how TCP worked under the hood gone through great pains
> to use the minimum number of TCP connections used to the number which will
> reliably saturate the net connection and provide good piece diffusion.

BitTorrent must not not have any books on copyediting.

