I understand the enthusiasm to spread knowledge but a wall of text like that isn't going to be helpful for novices. When technical people try to impart knowledge, the text they write usually suffers from the Curse of Knowledge. For example, "DHCP" and "UDP" is mentioned several times which is a level of detail that is not necessary for an introductory overview.
Instead, to explain how the internet works would require lots of diagrams, pictures, videos, etc. illustrating different scenarios. E.g., sending SMTP is different from HTTP and both use DNS. All these "mental buckets" (and many others) are difficult to separate for novices. Instead of drilling into a timeline of bytes sent from Alice to Bob, new concepts can be taught as "progressive layers" of detail (with the timeline rewound each time.) Answering the question of "how the internet works" is so open-ended that apparently, some tech interviewers like to use it for evaluating job candidates.
If a reader were to actually use networks for something (like try to write a program that uses TCP), they would obviously need much more detailed knowledge. At that point, diagrams become necessary, as do much deeper explanations, but the goal of this post was to give an overview of what things are needed to provide end-to-end communication, and why.
It doesn't address the 'progressive layers' issue, but it _does_ include a TOC.
I have been asked this question before, and my answer was definitely biased towards the networking parts (DNS, TCP/IP, HTTP/S, certificates ...), and I provided absolutely no detail about keyboard/mouse/monitor I/O, parsing or rendering.
It does an astonishingly good job of describing how networks evolved and why the function as they do today (as of mid-90's - not much has changed AFAICT).
Even the Kindle version of that book is $100.
"When Bob "sumbits" this form, another HTTP request will be sent"
It took a moment to decide that it probably wasn't intentional :)
It's also quite misleading to talk about the hostname once you've mentioned BGP. At that layer, the routers have completely "forgotten" the hostname, and are only looking at IP address/prefix to move that packet.
I understand why one might want to illustrate the fact that mail delivered via the SMTP protocol might be retrieved from the same server via the HTTP protocol running in different ports. But this is not a feature of TCP. The term "TCP multiplexing" commonly refers to reusing the same end-to-end TCP session for multiple application requests (e.g. more than one HTTP GET to the same server/port). Very different than IP multiplexing.
But email is also a very bad example to use to illustrate IP multiplexing, because email uses a store-and-forward mechanism very unlike HTTP lookups. A sender's email client just doesn't do lookups for a remote email provider. A mail client has a provider-specified SMTP server for all outbound mail traffic, regardless of destination. The client does use DNS to find that server's IP address. But the sender's provider's outbound email server performs an entirely different category of DNS lookup to figure out the destination mail server for the domain. This difference makes it a particularly bad idea to mention SMTP at all, only to say that it's unimportant.
I'm sorry this is such harsh criticism. It is extremely difficult to learn this material, let alone teach it to someone else in a digestible fashion. So please don't let this discourage you from trying to write a document like this. But it would be a lot more effective if you just didn't try at all to explain quite so much material in one post. Hope that helps a little?
You're right that BGP doesn't care about hostnames, or even individual IPs, but I'm not sure that distinction is actually relevant to someone who is being exposed to this just now.
No, I disagree. IP has no multiplexing features beyond protocol multiplexing. Port numbers that allow multiplexing a single IP among multiple applications on the same host only appear in UDP/TCP. What you are referring to is specifically called HTTP multiplexing, and has little to do with TCP.
Again, technically, you are right, however there is nothing in SMTP that precludes the sender from contacting the destination SMTP server directly (well, except that spam detection systems will freak out). The fact that SMTP supports hop-by-hop forwarding, and in many cases this is the only mode that is used, does not mean that this is necessitated by the protocol. And in the interest of making the content easier to understand, I decided describing multi-hop SMTP was simply unnecessary.
I appreciate the feedback --- it's always hard to piece together relatively complex posts like this. As I mentioned elsewhere, I specifically wanted this to be a single post such that readers could follow the communication flow end-to-end. I believe this will improve reader comprehension, though of course YMMV.
Set up a very simple home network topology. A router/modem, and a wired ethernet connection to a laptop. Explain all 4 layers of the tcp/ip model. Explain DNS.
BGP seems sort of advanced for an introduction to the topic, to be honest, and so few people use smtp rather than just using gmail's web interface, that it's rather obscure.