
Ask HN: How, bottom to top, does a modern web application work? - dnotrael
Hi y&#x27;all.<p>While refactoring a very old PHP website, I wanted to make it as portable as possible so switching providers (such as DigitalOcean to some other VPS) would be easier in the future. I came across this: http:&#x2F;&#x2F;12factor.net&#x2F;port-binding, talking about exposing the web site on a port. I had always wondered why Rails was usually on Port 3000 while my local PHP website just ran at some URL. I got to digging more and I think I started drowning in confusion. Does Apache listen on 80&#x2F;443 and just spawn PHP processes (or so I&#x27;ve read?). Is this what it does with Phusion Passenger for Rails? If so, does Rails still get served at localhost:3000 in this case?<p>It mentions the Thin client for Ruby, which is a web server, that is built on top of Rack, which is a web server interface, but there is also a reverse proxy Nginx (but Nginx is also a web server)? Rack and Thin seem to have identical interfaces, so I&#x27;m not sure what is going on here. Why doesn&#x27;t a PHP app have to do any of that? Is it because Apache handles it? Also I also don&#x27;t understand the details of CGI, FastCGI, WSGI, etc. and why each language needs its own gateway interface. Is Rack the WSGI of Ruby? I understand (I think) that at a very high level, it is a specification for a web server that, instead of serving documents, can pass an HTTP request along to a program which returns a response, that the web server forwards back to the requester.<p>I feel like I have a lot of details but I have no idea how they all fit together. I think what I&#x27;m looking for would startanswer would start: &quot;A user makes an HTTP request from their Browser to retrieve a web page sitting on some random machine somewhere. There is a socket connection established (IP:PORT#) and it sends streams of bytes under the TCP protocol...&quot; &lt;-- Not even sure if this is correct<p>Thanks in advance for any help or redirection to other resources!
======
hoodoof
Your path to enlightenment begins by learning at a detailed level what a web
server does, and what a web browser says to that server.

Do these tutorials:

[http://ruslanspivak.com/lsbaws-part1/](http://ruslanspivak.com/lsbaws-part1/)

[http://ruslanspivak.com/lsbaws-part2/](http://ruslanspivak.com/lsbaws-part2/)

[http://ruslanspivak.com/lsbaws-part3/](http://ruslanspivak.com/lsbaws-part3/)

And when you are doing web development, use this command to understand exactly
what is being passed between client and server (replace port number with the
port number that your server is running on).

sudo ngrep -W byline -d lo port 8001

You have progressed to the next rank in your journey when you understand in
detail what a request (typically from a browser) looks like and how it is
structured, and what a response (from a web server) looks like and how it is
structured.

~~~
dnotrael
This is exactly the kind of resource I was looking for. Thank you. I hope he
keeps making more tutorials because I am very interested in the other things
on his About page. I'm still excited to see what other answers people have.

------
arh68
You can run 1 web server or 2. You need at least one listening on :80; it can
either pass the request off to a module (like mod_php, mod_perl, CGI) or it
can pass the request off to another web server. The 12factor site is espousing
this second approach, separating the front-end server (perhaps nginx on :80)
from the application server (perhaps apache on :8080/3000, running
mod_whatever, or in their case, a standalone app server like Jetty/Tornado).
Almost any web server can run as a reverse proxy, but mod_php/mod_perl usually
require a specific server like Apache.

If you've got a dollar, and an hour, maybe run through setting it all up [1],
then run lsof -i at the end to understand what's listening on what ports.

[1] [https://www.digitalocean.com/community/tutorials/how-to-
conf...](https://www.digitalocean.com/community/tutorials/how-to-configure-
nginx-as-a-web-server-and-reverse-proxy-for-apache-on-one-
ubuntu-14-04-droplet)

~~~
dnotrael
Thanks for making the distinction between the 1 and 2 web server set-ups. I
actually have a DO droplet and I was looking to move the application to Nginx
+ FPM just for learning purposes so that tutorial is helpful. I've also read
some of the other tutorials on Nginx.

~~~
HatersGunnaHate
The way I learned how to setup a server without using a control panel of some
kind is by diving right in.

At first I messed around on a local virtual machine running the same distro I
planned to use.

In a few hours I had NGINX, PHP-FPM and Rails apps up and running on a server.

------
javajosh
Well it's good you're aware of processes and ports, because that's the key
thing. Then I would focus more on how they relate to each other, and to the
outside world.

Server processes bind to a port, and then run some code to handle inbound
messages. For a web server, the inbound messages are string key/value pairs.
By convention, there is special support for file access in HTTP, which is the
path section of the URL. Webapps use the path just as an ordinary string.

There are really two parts to understanding the system: first, configuring all
the processes and spinning them up in the right order. Second, once it's
running, how messages flow through those processes. A reverse proxy is
terrible jargon for a server process that listens on one port and forwards the
traffic to another (usually local) port. (It's called a reverse proxy because
forward proxies are used by clients connecting out to servers).

You might have 4 server processes. Nginx, Apache/PHP, Rails (RVM) and MySQL.
The first three _can_ all listen on port 80, but we pick Nginx to serve that
role, and configure it to do so and forward to Apache and the RVM, which are
configured for arbitrary higher ports. Meanwhile, you've probably configured
Apache and RVM to be able to talk to MySQL (with 'drivers') and configured
them to know about your running MySQL instance. All 4 of these processes are
probably emitting logs to disk somewhere, too.

Interestingly, in the absence of a request, all of these processes just sit
there. They do nothing, and use precisely 0 of your CPU. It is only when a
client process presents Nginx on port 80 with a new string that they all come
to life: nginx first, then PHP (say), then, if the stars align, MySQL. The
confirmation bubbles back up a distributed call chain and out to the client.

Yay, CRUD.

------
bgurupra
Here is a detailed crowdsourced answer to the interview question ""What
happens when you type google.com into your browser's address box and press
enter?""

[https://github.com/alex/what-happens-when](https://github.com/alex/what-
happens-when)

------
nostrademons
This makes a great interview question. :-)

The short answer is that it depends on how you set up your webserver.
Different languages have different defaults, and of course you can override
the defaults and set them up entirely differently.

A typical PHP installation runs as an Apache module. In this setup, the Apache
server listens for HTTP requests and looks at their request path and virtual
host. When it finds one that matches a PHP rule (defined in your Apache
configuration), it starts the PHP interpreter, setting variables like $_GET
and $_POST from the request data. On very old PHP installations, it then uses
the path to locate the PHP script, parses it, and executes it. In newer
(post-~2005) installations, the server caches the compiled PHP script in
memory and executes it again on subsequent requests, without having to hit
disk to read the file contents.

A Rails or Django deployment circa 2007 would use Nginx or Lighttpd as the
webserver, and then communicate with a separate application server over
FastCGI or SCGI. The latter are simple binary protocols that are designed
solely to communicate between webserver and appserver; they basically include
the information in the HTTP request, but in a parsed, compact format that's
fast to decode. The application server would then decode the request and pass
it to a web framework to execute, returning an HTML response that is forwarded
on by the webserver.

Why split the webserver from the appserver? Because running your application's
code is typically slow and memory-intensive; it's usually CPU-constrained.
Serving static files and parsing HTTP, meanwhile, is typically fast, cheap,
and bandwidth constrained. If you connect the app server directly to the
Internet, your memory-hogging application will sit idle much of the time while
pushing bytes out to a browser on dialup or a cell network. Splitting the
servers lets you scale them independently; typically, you need just a few
frontend load balancers to serve many app servers. It also gives you fault
tolerance, since if an app server crashes, the load balancer can retry the
request with a different one.

WSGI and Rack are HTTP interface specifications for Python and Ruby,
respectively. Basically, all your Python/Ruby code needs to run inside a
server _somewhere_ , which talks some network protocol. A number of different
web frameworks have cropped up to make programming webapps easier - things
like Django, Pylons, and Flask for Python and Rails or Merb in Ruby - and
these frameworks typically optimize for ease of programming. Similarly, a
number of different appservers have cropped up - gunicorn and uwsgi for
Python, unicorn and Mongrel and Thin for Ruby, Phusion Passenger for both -
and these typically optimize for speed. A common gateway interface lets you
mix and match between them. The reason why you need a different one for each
language is that the app server and the webapp framework are typically hosted
in the same process, communicating in-memory, and different languages have
different memory representations. You don't always, however - uWSGI, for
example, is written in C and can be used with Python, Perl, or Ruby.

Around 2010, people realized that HTTP was a perfectly valid transport
protocol, and started to use it in place of FastCGI and SCGI. Now virtually
all deployments use nginx talking http to one or more appservers that run the
actual webapp code.

So to a first approximation, when your HTTP request hits a server, it makes a
TCP connection to port 80 on an nginx instance somewhere. nginx parses the
request (which looks something like "GET /myapp/index.php
HTTP/1.1\n\n...headers..." \- HTTP is just a text protocol), looks in its
configuration file, and matches /myapp/*.php against the rule for some app
(pretend it's actually a Python/Django app running on the same physical server
for illustration). It then makes an HTTP request to localhost:3000 on the
server to talk to the app server. On port 3000, you have uWSGI running, which
again parses the request, populates a Python dictionary, and invokes the
callable given in the uwsgi config file. That callable will typically be
Django's entry point, where Django consults its root urlconf and routes the
request to your application code.

~~~
miles932
Not any more! ;)

~~~
fizwhiz
Care to elaborate?

~~~
miles932
Well, by typing out such a well thought out and complete answer, I'd say it's
value as an interview question has been somewhat diminished. Otherwise, a
great writeup!

~~~
nostrademons
You can still ask "What's going on at the byte level on the wire?" or "What's
going on in the GPU when it renders the page?" or "What's going on inside
Django when it computes what code to invoke?" or "What's going on inside the
Python interpreter when it executes it?" The beauty of the question is that
it's almost infinitely deep, and there's nowhere near enough time to answer it
all in an interview, and the depth to which the candidate _can_ go tells you a
lot about where their specialty lies.

------
rockerBOO
Apache is a web server that add parsers as modules (php). NGINX is a reverse
proxy/web server that can proxy to other processes listening on other ports.
(PHP-FastCGI running on port 3000).

BROWSER -> NGINX (80) -> PHP-FastCGI (3000) -> NGINX -> BROWSER

NGINX can also proxy on sockets

BROWSER -> NGINX (80) -> PHP-FastCGI (/usr/socks/php.socks) -> NGINX ->
BROWSER

CGI just defines the interface, and the language would be implementing the
interface.

~~~
dnotrael
Very succinct answer! Thank you. I am still a bit confused as to what
interface CGI exposes? Is it a way for a programming language to take a HTTP
request? As in, PHP-FastCGI is an implementation in PHP of receiving the HTTP
request as data, and then calling the appropriate handler in the application?

