
Understanding IP, TCP, and HTTP - danieleggert
http://www.objc.io/issue-10/ip-tcp-http.html
======
zAy0LfpBZLC8mAC
How I love it when people without deep knowledge of some subject write
authoritative sounding articles.

Without guarantee of completeness, to avoid the spread of misinformation:

\- IPv6 fragmentation has nothing to do with some "minimum payload size"
(whatever that is) - there simply is no fragmentation being done by routers,
the sender still can fragment however it pleases, and presumably will do so
whenever it has to send a packet that doesn't fit through the path MTU.

\- The end points use Packet Too big ICMP6 messages to determine _path_ MTU,
which is different from just "the MTU".

\- With IPv4, the sender chooses whether a router will fragment when the
packet exceeds the next-hop MTU or whether the router should drop the packet
and send a Fragmentation Needed ICMP message - where the latter again is used
for path MTU discovery.

\- Path MTU discovery is useful because it allows the sending IP
implementation to push the chunking higher up the stack when the sending
higher-level protocol has the capability (as is the case with TCP, but not
with UDP, for example), which tends to produce lower overhead. Unfortunately,
some clueless firewall administrators, such as those responsible for AWS EC2,
do filter all ICMP because they for unknown reasons consider it to be bad,
thus breaking PMTUD, which can lead to hanging TCP connections.

\- TCP sequence numbers are for bytes, always, with the special case of SYN
and FIN also counting as "bytes" in the sequence, but never for segments.

~~~
unethical_ban
I love it when people without deep knowledge of a subject try to learn about
it and explain themselves to others.

~~~
stusmall
This is an important part of the way I learn. I will read something and then
explain it to someone else. It makes me think deeper about the issue as I form
the words and it gives me a great chance to get corrected when I am making
unfair assumptions. I always preface this conversations with "as I understand
it" or "from what I read" or some other disclaimer. I used to have a coworker
who would give me soooo much guff about these disclaimers since I'd drop so
many of them in one of these conversations. I just felt it was important to
make it clear I wasn't coming from a place of authority and more from the
perspective of a guy who is bumbling through it and trying to figure out what
the hell is going on.

~~~
Jugurtha
It's the way I learn, too.. But it doesn't make me write things as a guy who's
knowledgeable.

I'm a total noob, yet the first few paragraphs made me cringe because I felt
there were some odd things. I had a weird feeling about it. It wouldn't have
bothered me if there wasn't this "A periodical about _best practices and
advanced techniques_ in Objective-C"..

Or using the word "great contributors", etc. I mean, one has to be humble
because unless one _really_ knows his stuff, he shouldn't talk that way.

If the writing style was more in the "I'm learning and journaling my
progress", it would've been more than okay, and knowledgeable people wouldn't
have a problem with it.

I was in forums and learning to design my PCB's, I'd post my design and ask
for _feedback_ , and people who'd spent 30+ years would comment on them and
point flaws on what I thought was nice and would find a thousand flaws in it.
And I got back to wrok, iteration after iteration.. Until these really great
guys who do that for a living would say "Beautiful work".

Had I posted something like "advanced PCB design" in the "this is how it's
done" way, they'd have ignored me and I would've stayed more ignorant than I
still am.

There was a question on the Python mailing list asking how long it takes to
say that one knows how to program. People with 40+ years actively programming
said: I'll let you know when I'm there.

Humility goes a long way. Heck even when I read things on the nmap mailing
list, I don't feel that tone that they consider they know more than you do
even though they really, really know their stuff.

~~~
ohblahitsme
Could you post an example of the "I'm learning and journaling my progress"
writing style? I'd like to start doing this and I don't want to come off as an
expert on things I'm just learning.

~~~
mattikl
One thing is to not publish it -- a learning journal is probably much more
important for you to write than for anyone to read. Then give yourself a
couple of years or decades of learning time, and if you still want to write
about it, what you wrote as a beginner will give you valuable insights into
the beginner's mind, things you have probably forgotten.

And of course you can publish it (might be good for feedback), just state that
it's a learning journal, not "best practices".

~~~
Jugurtha
Great idea. I have a notebook where I write down ideas for companies, things I
think about. I think it is a really, really good practice to write it down..

The reason I'm saying that is that human beings have selective memory. They
tend to remember things they did the right way, they remember their good
ideas, times they were right, etc.

I used to note my ideas that would seem genius.. And then I'd look at them a
couple months later and it's humbling. How stupid could I be.

But there is a good thing about this: It taught me a valuable lesson.. It
taught me to focus on real needs, and not some fancy thoughts I have at 3AM.
Like real needs.

And I know that at an early stage, one needs to let go of critical things and
be open and not dismiss ideas, etc.. But it's just that some ideas are plain
stupid and I had plenty of those.

I write them down, then cross things. Not a real need, not a problem. Now I'm
thinking about an idea that I'd use if it were available. And I'm not the only
one.

------
baddox
I've heard that a good way to gauge a person's general technological literacy
is to simply ask "what happens when I type a URL in a browser and hit Enter?"
Obviously, the question is deliberately open-ended, and any step in the
process can be broken down into more detailed steps (up to a point). I'd like
to see an article that initially shows high-level steps (e.g. DNS request,
HTTP request, server processing, HTTP response, parsing and rendering), but
allows each step to be expanded progressively with increasing detail.

~~~
pbhjpbhj
[Deeper]

The Pauli exclusion principle prevents electrons with the same quantum
characteristics from entering the same space, this interaction occurs across
the amassed copper atoms which form the majority of the metallic wires that
approach one another in the internal structure of the keyboard ...

------
norswap
If you want to further your understanding of network protocols, there's an
excellent open textbook available here:
[http://cnp3book.info.ucl.ac.be/](http://cnp3book.info.ucl.ac.be/)

~~~
maaaats
At Uni we had a book called "Computer Networking - A top down approach". One
of the best teaching books I've ever read. The amount of detail is very nice
balanced, and as the title says it's a top down approach where one layer at a
time is discussed. Very interesting.

~~~
greyskull
We're using that same book, sixth edition, in my networks course right now.
It's overly verbose like every other textbook, but the content is solid.

------
mrtbld
_> There’s a misconception that restarting the (HTTP) request will fix the
problem. That is not the case. Again, TCP will resend those packets that need
resending on its own._

But that's not true if the connection is interrupted at the socket level,
right?

For example, if the device switches from 3G to Wi-Fi, or from Wi-Fi to wire,
then I believe, its hardware address changes, its IP address changes and the
socket becomes stale. But the TCP connection, would it be closed right away or
would it hang until some timeout? (And does it depend on the OS?)

~~~
zAy0LfpBZLC8mAC
The layers are conceptually independent, and in a way even the concept of
"switches from 3G to WiFi" is a misconception.

The TCP socket doesn't know anything about any "interfaces" or "links" or
anything like that, it only knows about its and the remote IP address (and
port), and the IP stack will deliver any packets to it that it receives that
are addressed to that port on that address coming from the corresponding
remote address and port, no matter which link it was received through
(possibly subject to reverse path filtering on end hosts as a security
measure). Similarly, each outbound packet is routed independently, so if the
routing table changes half-way through a TCP connection, packets simply will
be routed via a different link (the end host really just does the same as any
other router does, and the fundamental idea of packet switched networks is
that routers to not know about connections, they simply forward each packet
independently, potentially switching links as needed at any time).

It perfectly possible, for example, to bridge between WiFi and wired ethernet,
and have a gateway that routes some IP network onto that Ethernet/WiFi, then,
while connected to the WiFi, establish a TCP connection, disconnect from the
WiFi, connect to the Ethernet via cable, using the same IP addresss on the
Ethernet interface as you previously used on the WiFi interface, and the TCP
connection will survive that just fine (it might take a moment for the router
to time out its neighbour cache entry and re-resolve your IP address into the
new hardware address, but that's just a matter of a few seconds). You could
even connect to both, configure things such that the kernel only replied to
ARP/ICMP6 ND on WiFi, say, and route outbound packets through the cable, then
the outbound packets of the TCP connection would go through the cable while
the inbound packets would go through the air.

The only thing that actually breaks a connection is when packets addressed to
the address that your TCP connection is using cannot reach you anymore, or
when packets you send using that address can not reach the other side anymore,
for example because you send them through a link that does not allow you to
use that address. The latter really is mostly what kills TCP connections on
mobile phones: the default route gets changed from WiFi to G3, say, and your
mobile provider won't allow you to continue sending through their network
packets using the address you got assigned by the WiFi - so the connection
hangs even if the WiFi interface might actually still be up and able to
receive packets addressed to that address.

One important thing to notice in this: There isn't really any way how a TCP
implementation could detect right away that any of this has happened, as it
cannot know what the filtering policies of your provider(s)/network(s) are or
whether you disconnected only temporarily or whether you will reconnect to a
different access point to the same network ... - so, when some mobile platform
kills TCP sockets when you "change from 3G to WiFi" that really is a dirty
hack that makes a load of assumptions about some typical setups that don't
necessarily hold true.

------
laichzeit0
If you want to learn about IP, TCP, UDP and some of the protocols below this I
would highly recommend reading Richard Stevens book TCP/IP Illustrated, Volume
1: The Protocols.

For two reasons: It's probably one of the best introductions to the subject
that has ever been written, and it's a model example of how a technical book
should be written.

I'd be hard pressed to find a reason not to go this route at least once in
your life. I know the material pretty well but I still re-read Stevens books
every few years just because it is so good.

~~~
roel_v
"I'd be hard pressed to find a reason not to go this route at least once in
your life. I know the material pretty well but I still re-read Stevens books
every few years just because it is so good."

Then again, that's a lot of effort to spend on something that the vast
majority of us don't need to know in much detail. The main reasons for knowing
all the details are

\- to write a new networking stack, or working on an existing one;

\- to write or maintain server software or routers or caches or other software
directly involved in networking;

\- to break or exploit existing software.

(obviously 'because it's interesting' is a _valid_ , but not _practical_
reason to know)

~~~
mje__
If you write anything that communicates over a network (e.g. anything using
HTTP), you need to know at least some of this stuff, otherwise you're not
going to be able to explain why (for example) your service call latencies have
a big spike around 200ms.

------
rubiquity
It's nice to see this recent increased emphasis on Web/mobile developers
understanding the technologies that link it all together. The next thing I
would add is a high level overview of the sockets API. While these topics
aren't critical to most day-to-day lives of developers, they are certainly
useful to understand.

~~~
marai2
This is a very readable online book on networking and sockets:
[http://beej.us/guide/bgnet/output/html/multipage/index.html](http://beej.us/guide/bgnet/output/html/multipage/index.html)

Talk about understanding the sockets API ;-) here's the content section for
chapter 5:

    
    
      5.1. getaddrinfo() — Prepare to launch!
      5.2. socket() — Get the File Descriptor!
      5.3. bind() — What port am I on?
      5.4. connect() — Hey, you!
      5.5. listen() — Will somebody please call me?

~~~
stusmall
I love beej's guide! Its where I learned socket programming. It is an art to
make such a dry subject as entertaining as he does.

------
jokoon
I'm still curious about an explanation why do we have both TCP and UDP.

For example if you do peer to peer, you need low latency, and UDP is best for
that.

I think it's because TCP is hardware optimized, but it's designed to transmit
a file in a stream, so if a packet is corrupt, it just waits to send that
packet. In that fashion, TCP tend to be slower, but on average it's more
efficient for single files or webpages.

You don't have good granularity with TCP, but if you want to work with UDP,
you need to add redundancy and other mechanisms to make sure all is good.

ENet is an example of using UDP for gaming, so the goal is to have the lowest
latency possible.

~~~
zAy0LfpBZLC8mAC
Bittorrent is also peer to peer, and it doesn't need low latency. Really, it's
about latency, nothing to do with peer to peer.

TCP has head-of-queue blocking, as it guarantees complete and in-order
delivery, so when a packet gets lost in transit, it has to wait for a
retransmit of the missing packet, whereas UDP delivers packets to the
application as they arrive, including duplicates and without any guarantee
that a packet arrives at all or which order they arrive (it really is
essentially IP with port numbers and an (optional) payload checksum added),
but that is fine for telephony, for example, where it usually simply doesn't
matter when a few milliseconds of audio are missing, but delay is very
annoying, so you don't bother with retransmits, you just drop any duplicates,
sort reordered packets into the right order for a few hundred milliseconds of
jitter buffer, and if packets don't show up in time or at all, they are simply
skipped, possible interpolated where supported by the codec.

Also, a major part of TCP is flow control, to make sure you get as much
througput as possible, but without overloading the network (which is kinda
redundant, as an overloaded network will drop your packets, which means you'd
have to do retransmits, which hurts throughput), UDP doesn't have any of that
- which makes sense for applications like telephony, as telephony with a given
codec needs a certain amount of bandwidth, you can not "slow it down", and
additional bandwidth also doesn't make the call go faster.

In addition to realtime/low latency applications, UDP makes sense for really
small transactions, such as DNS lookups, simply because it doesn't have the
TCP connection establishment and teardown overhead, both in terms of latency
and in terms of bandwidth use. If your request is smaller than a typical MTU
and the repsonse probably is, too, you can be done in one roundtrip, with no
need to keep any state at the server, and flow control als ordering and all
that probably isn't particularly useful for such uses either.

And then, you can use UDP to build your own TCP replacements, of course, but
it's probably not a good idea without some deep understanding of network
dynamics, modern TCP algorithms are pretty sophisticated.

Also, I guess it should be mentioned that there is more than UDP and TCP, such
as SCTP and DCCP. The only problem currently is that the (IPv4) internet is
full of NAT gateways which make it impossible to use protocols other than UDP
and TCP in end-user applications.

~~~
bachback
you made some very interesting posts. what I'm missing from this is 0mq. I
believe it introduces layering mechanisms, so that one can re-use patterns to
build cool stuff (anything really), without knowing specific details. do you
have a email where I can reach you?

------
sajal83
> The improvements of using HTTP pipelining can be quite dramatic over high-
> latency connections – which is what you have when your iPhone is not on Wi-
> Fi. In fact, there’s been some research that suggests that there’s no
> additional performance benefit to using SPDY over HTTP pipelining on mobile
> networks

Excellent summary but i think pipeline has been oversimplified. HTTP
pipelining is a FIFO queue. The responses have to be delivered in the same
order as the requests. So if the first(or an early) response took longer to
generate, all other requests in the pipeline have to wait. Something that SPDY
is not susceptible to.

------
teddyh
I prefer _The Unix and Internet Fundamentals HOWTO_ :

[http://en.tldp.org/HOWTO/Unix-and-Internet-Fundamentals-
HOWT...](http://en.tldp.org/HOWTO/Unix-and-Internet-Fundamentals-HOWTO/)

------
brudgers
David Wetherall teaches this course @ Coursera.

[https://www.coursera.org/course/comnetworks](https://www.coursera.org/course/comnetworks)

He pretty much wrote the book.

------
notfoss
There's a minor typo below the HTTPS section. It should be TLS not TSL ;)

Edit: By the way, it was a nice article. I especially liked the tcpdump
explanation.

