Hacker News new | past | comments | ask | show | jobs | submit login
How the Internet works (thesquareplanet.com)
45 points by Jonhoo on Oct 12, 2015 | hide | past | favorite | 20 comments

>In the interest of making the topic easier to comprehend, we will start with a fairly high-level overview, and then dive into each individual component as we go along. Our journey starts with a user called Alice. Alice owns a laptop, and wants to send e-mail to a person called Bob. ... [wall of text...]

I understand the enthusiasm to spread knowledge but a wall of text like that isn't going to be helpful for novices. When technical people try to impart knowledge, the text they write usually suffers from the Curse of Knowledge[1]. For example, "DHCP" and "UDP" is mentioned several times which is a level of detail that is not necessary for an introductory overview.

Instead, to explain how the internet works would require lots of diagrams, pictures, videos, etc. illustrating different scenarios. E.g., sending SMTP is different from HTTP and both use DNS. All these "mental buckets" (and many others) are difficult to separate for novices. Instead of drilling into a timeline of bytes sent from Alice to Bob, new concepts can be taught as "progressive layers" of detail (with the timeline rewound each time.) Answering the question of "how the internet works" is so open-ended that apparently, some tech interviewers like to use it for evaluating job candidates.


I see your point, but I think it's still important to give an end-to-end description, allowing people to tie it all together. The wall of text, if you'd read it, uses progressive layers in several places (like core Internet routing), specifically to try and avoid information overload for the reader.

If a reader were to actually use networks for something (like try to write a program that uses TCP), they would obviously need much more detailed knowledge. At that point, diagrams become necessary, as do much deeper explanations, but the goal of this post was to give an overview of what things are needed to provide end-to-end communication, and why.

I like this version: https://github.com/alex/what-happens-when

It doesn't address the 'progressive layers' issue, but it _does_ include a TOC.

I have been asked this question before, and my answer was definitely biased towards the networking parts (DNS, TCP/IP, HTTP/S, certificates ...), and I provided absolutely no detail about keyboard/mouse/monitor I/O, parsing or rendering.

It is getting a bit old now, but I showed the Warriors of the Net[0] video in the past to a few non-technical people who wanted to better understand things.

[0] http://warriorsofthe.net/index.html

Agree on all your points. I also think the general structure can be improved by adding a table of contents. This will better guide the user through the wall of text.

Have you seen Code.org's video on routing? Might be worth linking to, as it is very simple to understand:


I'd honestly recommend that people just pick up a copy of and devote some time getting through Andrew S. Tanenbaum and David J. Wetherall's Computer Networks.

I'd also recommend "How the world was one" ( http://www.amazon.com/How-World-Was-Arthur-Clarke/dp/0553074... )

It does an astonishingly good job of describing how networks evolved and why the function as they do today (as of mid-90's - not much has changed AFAICT).

Just bought a used paperback version from Amazon, thanks!

Also, Tubes by Andrew Blum to get a taste of the physical part of the Internet. (http://www.amazon.com/Tubes-A-Journey-Center-Internet/dp/006...)

Any recommendations that aren't textbooks?

Even the Kindle version of that book is $100.

So I just copy/pasted the book recommendation into search, and the first result was a pdf of the entire book hosted on an edu server. Forgive my ignorance, but would this be a pirated copy or something? This often happens when I'm searching for book recommendations like this and I just assumed older versions were pushed out into the public domain or something.

Apologies in advance for a near useless comment, but this typo? made me smile:

"When Bob "sumbits" this form, another HTTP request will be sent"

It took a moment to decide that it probably wasn't intentional :)

Hehe, thanks, fixed.

The "Tier" system is out-dated at best and shouldn't be regurgitated in this day and age.

I'd be happy to get rid of it if it is indeed out-dated as you say, though that was not my impression. Could you give me some references giving information about how core Internet is now being done?

Plenty of people will still talk about providers that way, but it has never been about topology the way you describe. BGP is a mesh, not a tree. Packets that do have to leave one network are routed to the closest BGP peer. Plenty of small providers peer directly with other small providers (and always have).

It's also quite misleading to talk about the hostname once you've mentioned BGP. At that layer, the routers have completely "forgotten" the hostname, and are only looking at IP address/prefix to move that packet.

I understand why one might want to illustrate the fact that mail delivered via the SMTP protocol might be retrieved from the same server via the HTTP protocol running in different ports. But this is not a feature of TCP. The term "TCP multiplexing" commonly refers to reusing the same end-to-end TCP session for multiple application requests (e.g. more than one HTTP GET to the same server/port). Very different than IP multiplexing.

But email is also a very bad example to use to illustrate IP multiplexing, because email uses a store-and-forward mechanism very unlike HTTP lookups. A sender's email client just doesn't do lookups for a remote email provider. A mail client has a provider-specified SMTP server for all outbound mail traffic, regardless of destination. The client does use DNS to find that server's IP address. But the sender's provider's outbound email server performs an entirely different category of DNS lookup to figure out the destination mail server for the domain. This difference makes it a particularly bad idea to mention SMTP at all, only to say that it's unimportant.

I'm sorry this is such harsh criticism. It is extremely difficult to learn this material, let alone teach it to someone else in a digestible fashion. So please don't let this discourage you from trying to write a document like this. But it would be a lot more effective if you just didn't try at all to explain quite so much material in one post. Hope that helps a little?

I never refer to it as a tree, only as, effectively, a tiered mesh. While it's true that this breaks down with multi-hop BGP forwarding, I think it conveys enough about how the routing works for a layperson to understand that is relatively close to the truth, no? I could potentially cut down the section on Internet routing significantly, and simply mention that it is a routing fabric that does relatively greedy hop-by-hop forwarding, though I'm not sure that will be more digestable.

You're right that BGP doesn't care about hostnames, or even individual IPs, but I'm not sure that distinction is actually relevant to someone who is being exposed to this just now.

No, I disagree. IP has no multiplexing features beyond protocol multiplexing. Port numbers that allow multiplexing a single IP among multiple applications on the same host only appear in UDP/TCP. What you are referring to is specifically called HTTP multiplexing, and has little to do with TCP.

Again, technically, you are right, however there is nothing in SMTP that precludes the sender from contacting the destination SMTP server directly (well, except that spam detection systems will freak out). The fact that SMTP supports hop-by-hop forwarding, and in many cases this is the only mode that is used, does not mean that this is necessitated by the protocol. And in the interest of making the content easier to understand, I decided describing multi-hop SMTP was simply unnecessary.

I appreciate the feedback --- it's always hard to piece together relatively complex posts like this. As I mentioned elsewhere, I specifically wanted this to be a single post such that readers could follow the communication flow end-to-end. I believe this will improve reader comprehension, though of course YMMV.

If I were writing an introduction to lay people, I'd probably just explain the process of loading facebook.com, something people will be very familiar with.

Set up a very simple home network topology. A router/modem, and a wired ethernet connection to a laptop. Explain all 4 layers of the tcp/ip model. Explain DNS.

BGP seems sort of advanced for an introduction to the topic, to be honest, and so few people use smtp rather than just using gmail's web interface, that it's rather obscure.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact