

TCP is harder than it looks - jsnell
http://jsnell.iki.fi/blog/archive/2014-11-11-tcp-is-harder-than-it-looks.html

======
longwave
This reminded me of the best TCP timeout story I've ever heard, the case of
the 500-mile email:
[http://www.ibiblio.org/harris/500milemail.html](http://www.ibiblio.org/harris/500milemail.html)

~~~
michaelx386
Great story with the added bonus of showing me the GNU Units command. What a
fantastic program, no more Googling when I need to convert stuff.

~~~
MrBuddyCasino
Indeed, I just checked OSX - sadly, units doesn't know about millilightseconds
("586 units, 56 prefixes").

~~~
taejo
GNU Units (available in Homebrew) is much more complete than the OS X version.

~~~
wmil
It still won't let me convert 'bytes / square mm' to my prefered data density
format, 'libraries of congress / football field'. So it needs work.

------
nabla9
What is the general lesson we should learn from this?

Postel's law aka robustness principle [1] can easily lead into accumulating
complexity when implementations adapt to the bugs in other implementations.
How could protocol designers mitigate this problem beforehand?

[1]:
[https://en.wikipedia.org/wiki/Robustness_principle](https://en.wikipedia.org/wiki/Robustness_principle)

~~~
cesarb
The lesson is "be strict in what you accept, since the beginning".

For instance, a lesson learned by the Linux kernel developers: when adding a
new system call to an operating system, if you have a flags parameter (and you
should, which is another lesson they learned), if any unknown flag is set,
fail with -EINVAL (or equivalent). Otherwise, buggy programs will depend on
these flags being ignored, and you won't be able to use them later for new
features.

But it has to be since the beginning; once the protocol is "in the field", you
can't increase the strictness without pain.

~~~
sillysaurus3
_Otherwise, buggy programs will depend on these flags being ignored, and you
won 't be able to use them later for new features._

I don't understand why that's true. Just add the new flag and use it for your
new feature. Aren't buggy programs who were already sending the flag
responsible for their own bugs?

~~~
oranlooney
Yes and no. If you push an update and sysadmins all over the world dutifully
upgrade and immediately notice programs breaking, their first thought is not
going to be "oh, those programs must have had latent bugs." No, they're going
to blame you. Besides, having stuff work correctly is always more fun than
assigning blame, and validating flags is an easy way to avoid these scenarios.

~~~
sillysaurus3
Hm, fair enough. To be clear, I wasn't arguing against validating flags. I was
commenting about "This kernel function only gets 32 possible bitflags, but
since they never validated their flags, they can no longer add any additional
flags, ever, because it might break other programs which may or may not even
exist."

That sort of mentality seems like it would push designers in the direction of
poor design decisions. If a bitflag is the best design for a new feature, but
they're prevented from using it out of a sense of "Let's not ever break
anything ever," then the result may be a bad design that people are stuck with
for the next 50+ years, which seems objectively worse.

But my reaction is based on theory and not backed by experience, so it's
probably unfounded.

------
leonardinius
I've to confess I've quite limited knowledge with TCP/IP stack internals, e.g.
the way stack extensions work et cetera.

Does anyone know of any available online visual materials/tutorials? I'm
particularly searching for tools capable of recording and replaying TCP/IP
stack packets with visual representation, references to RFCs and
specifications.

~~~
agumonkey
Coursera and stanford mooc helped me understand networking deeper, you might
try to apply.

~~~
signa11
which ones if you could point them out please ?

~~~
1945795
[https://www.coursera.org/course/comnetworks](https://www.coursera.org/course/comnetworks)

[http://networking.class.stanford.edu/](http://networking.class.stanford.edu/)

------
jfuhrman
Pretty much everything in technology is harder than it looks. Took me a day to
configure Apache Solr for the first time the other day while my estimate was
just one hour.

------
ay
SYN-ACK which does not get retransmitted, advertises a zero window and does
not have options looks like some implementation of SYN cookies. A nice (and
useful in some cases) hack but a violation of the TCP spec, which is why it is
disabled by default in most places that implement it.

------
known
I miss [http://www.kohala.com/start/](http://www.kohala.com/start/)

------
Jabbles
Is there a history of these compatibility workarounds being the source of
security bugs(e.g. DoS)?

~~~
zwp
Yes. See, for example:

[https://en.wikipedia.org/wiki/Christmas_tree_packet](https://en.wikipedia.org/wiki/Christmas_tree_packet)

[https://en.wikipedia.org/wiki/Ping_of_death](https://en.wikipedia.org/wiki/Ping_of_death)

------
markb139
Electricity doesn't conduct at the speed of light. Its about 2/3 the speed of
light.

------
acqq
The author uses the customized settings for his TCP stack and then laments
that some nodes on the internet which aren't under his control depend on more
common settings. Honestly I don't see why he could have expected any different
outcome.

~~~
philh
The other node doesn't depend on the settings, it depends on not receiving a
duplicate SYN.

~~~
acqq
Note what he actually writes:

"Our TCP implementation has a variable base SYN retransmit timout, and in this
case it was roughly 500ms. So most of the time the page load would fail with
our TCP stack, but succeed with an off the shelf one that had a SYN retransmit
timeout of 1 second."

If his timeout were 1 second, his connection with the other node _would work._
It's his own choice to insist on "variable base SYN retransmit timeout" even
with the nodes which weren't tested with such a setup. He also says:

"The options are either to tell a customer that some traffic won't work (which
is normally unacceptable, even if the root cause is undeniably on the other
end), or to water down a useful feature a little bit to at least fail no more
often than the "competition" does."

Meaning, the way they use that less common setting is "all or nothing": either
all clients or none. With the assumption that that way he will be
"competitive." Again, why is then surprising to discover that "the whole
world" is not perfect, once you implement your part with the assumption it
were?

Just because he knows what the "ideal case" is he can't expect that everything
he confronts would be ideal.

~~~
viraptor
I don't think it was an issue with the timeout. The other side simply could
not handle retransmitted SYN. At all. The 500ms comes up only because that
happened to be their RTT. If they used 1 second, this host would work, but
another one using the same stack but with 1 second delay would continue to
fail.

It's not about the other host not having an ideal implementation, or about
testing ideal cases. The other host had a bug which cannot cope with some
situations. There's nothing they could do about it, apart from tweaking the
timeout to some bigger value.

~~~
acqq
I worked for many years with the locations on the remote islands with the high
RTT. You wouldn't believe how much software of the biggest industry players
failed to work under that circumstances.

My favorite case was when one company first claimed "OK on another side is
probably the device from the competition." It was their device on another side
too. They "debugged" the case for a bigger part of the year and the case
remained unsolved.

I'm absolutely not saying that the far side implementation was OK. I concur
that the TCP is damn hard. It's just that you can't expect that "everybody
else" is perfect. You have to plan to handle the special cases or not be
surprised that you can't handle them.

~~~
viraptor
> You have to plan to handle the special cases or not be surprised that you
> can't handle them.

What do you propose then? You send valid traffic and get a completely broken
response - I don't see any space left for handling special cases here. Like
they said in the article, it's the first packet and you have no information
about the other side yet. This is not something you can avoid/workaround.

~~~
acqq
As soon as you think "I send a valid packet, everything must work" you're
missing the point. Even OP at the end worries about the actual client
communication _actually not working._ At the end, it's not who's
"theoretically" right, it's "can you make it work given the real world
limitations" that include the implementations not tested with your "clever
better than the competition dynamic timeout modification." It can be clever,
but be clever more and plan for the cases when it is against the real life
limitations. And don't cry foul. It's you who move into less tested
territories, expecting to be better than the "competition".

The same as I don't complain about the herd reactions here. They are probable,
therefore expected.

~~~
viraptor
Sigh... let me say this again clearly, because you keep repeating the same
point. This case has nothing to do with timeout modification. It will happen
with or without it. If you use lower timeout you will run into the issue more
often. If you use higher timeout - less often. But you will run into the issue
anyway and there is no way to fix it.

So just tell us what is your proposed solution / implied more tested
territories. You try to start a connection, send SYN, don't see a response for
X (500ms, 1s, whatever, you choose). What do you propose to do that isn't a
retransmit (breaks the other side) or choosing new seq numbers (breaks good
behaviour on your side by starting two connections and possibly tripping flood
detection if you do it too often).

~~~
acqq
Yes, the buggy implementation is buggy in idealistic sense. But it doesn't
matter. The fact is, that bug was de facto not visible until the OP introduced
the more "clever" (shorter timeout) code. If the node were really, utterly
problematic it wouldn't be on the internet.

What to do in this case? Well, what do you want to do? Want it to work with
everything the same? You have to start with the timeout of 1s, like other TCP
stacks do. That's what was tested, and then your behaviour to these nodes
wouldn't be different from the rest of the internet. If you don't want to do
this for all of yours connections (you're losing an advantage to the
competitors), then be ready to introduce the list of nodes where you observe
the behaviour, and assign the different starting timeout to them. Etc. There
is always a solution, the wrong approach is "it can't be solved because it's
against the RFC." The solution is making something work under the real-life
limitations, and not having the world where there's no bad implementation (the
state that's impossible to reach).

BTW I have an impression we don't understand each other because you have a
"mathematical" approach (finding one counterexample, my theorem is disproved,
let's wave my hand in the air in helplessness). I'm an engineer. There's a
real life out there. Everything you can imagine will have a real life
counterexample. Deal with it. Whatever you do, it's your decision. Ignore it,
adapt, whatever, it depends on what you want to do. Just don't stay on "it's
that others are wrong that's the problem, but I'm right and that's it." Unless
you want to do this for show. But then allow me to claim it's just a show.

Imagine Google writing around 2000 "we won't index the badly formed HTML
pages, because they are against the standard, and our XML processor will
_never_ work." There wouldn't be a Google today.

