A Primer on Proxies

ThePhysicist · on March 21, 2022

In my experience HTTP/2 doesn't boost proxy performance significantly, at least for HTTPs workloads. Normally the client sends a single CONNECT [hostname] request to the proxy and after that the proxy just forwards TCP packets on that connection, which is nothing that HTTP/2 can improve. A client can of course use the same HTTP/2 connection for multiple CONNECT requests, but opening several data streams to the proxy in parallel isn't much more costly as those streams tend to be long-lived and will often carry multiplexed HTTP/2 connections themselves, so further multiplexing on the proxy level often has little effect (in my experience at least).

Regarding QUIC/MASQUE it's still difficult to find any clients that support it. I think Chrome supports using QUIC to connect to a regular HTTP CONNECT proxy, but I don't know of any browsers that have even experimental support for MASQUE as of now. Will be pretty great tough I think once it's supported more widely.

Liuser · on March 21, 2022

> Normally the client sends a single CONNECT [hostname] request to the proxy and after that the proxy just forwards TCP packets on that connection, which is nothing that HTTP/2 can improve.

Trying to test my understanding - Does the CONNECT HTTP protocol need to match the underlying payload protocol. Eg After CONNECT over HTTP/1.1 tunnel is established it’s still possible for the client to still use HTTP/2 with the upstream server for its underlying payloads correct?

My intuition is that it doesn’t need to match because the proxy has no way to know what http protocol is being used when the workload is encrypted.

simmervigor · on March 21, 2022

The vanilla CONNECT method is an instruction to open a TCP connection to the target server. What is sent over that is entirely up to client and target the server it doesn't need to match. Its often TLS carrying HTTP but it could be anything.

The proxy could inspect the traffic it is carrying and try to enforce some access control or policy. However, the use of TLS or other encrypted protocols limits the ability to do see what is happening. Then you get into a different logical layer of whether there is MITM happening, but that is tangential to conventional use case for HTTP proxy.

jgrahamc · on March 21, 2022

The follow up is also a good read: https://blog.cloudflare.com/unlocking-quic-proxying-potentia...

ignoramous · on March 21, 2022

Tommy Pauly's EPIQCon 2021 keynote QUIC at Apple is pretty interesting as well: https://www.youtube-nocookie.com/embed/nP1yzxHcgeM

> What if we wanted to proxy QUIC? What if we wanted to proxy entire IP datagrams, similar to VPN technologies like IPsec or WireGuard? This is where MASQUE comes in.

Ref: https://ietf-wg-masque.github.io/draft-ietf-masque-connect-i...

eptcyka · on March 21, 2022

Thanks for the link to the presentation, I was just debugging some weirdness with PF and unblockable QUIC traffic on macOS Monterey.

ignoramous · on March 23, 2022

Np. But you must fills us in: Why was iOS QUIC unblockable at the firewall? And how did the presentation help?

eptcyka · on March 23, 2022

I will do once I've figured out what is actually going on. Maybe it's as banal as a rule to always allow traffic to a specific IP that's added by some macOS component to PF.

whoknew1122 · on March 21, 2022

I always enjoy the articles Cloudflare publishes. I still really don't like the company, but good job to the individuals who wrote the primer.

datalopers · on March 21, 2022

> I still really don't like the company

Is that because they provide TLS termination for an increasingly large chunk of all internet traffic?

whoknew1122 · on March 21, 2022

I really don't want to get sidetracked into a political or philosophical conversation. But to me, a policy of content neutrality is tacitly supporting extremist content. Cloudflare has to be dragged kicking and screaming before it stops profiting and protecting extremist sites and sites which host illegal content.

The concentration of internet infrastructure into a few companies is something to be concerned about. But my bigger beef with Cloudflare is that they profit from and protect despicable websites and only cave when public pressure is high enough.

mqnfred · on March 21, 2022

That to me is a reason to like the company rather than not... Let's see where you stand once your opinions are on the chopping block

whoknew1122 · on March 21, 2022

The company sure made a courageous stance protecting a site which advocates explicitly for accelerationist race wars, and another image board that was notorious for housing child abuse imagery and conspiracies which led to real-world deaths.

The answer to where a company should draw the line isn't 'OMG, slippery slope!' It's 'somewhere.' And preferably the CEO of the company shouldn't agonize about getting rid of stuff like the Daily Stormer and 8chan.

von_lohengramm · on March 21, 2022

> and another image board that was notorious for housing child abuse imagery and conspiracies which led to real-world deaths.

You know what, maybe you're right. Maybe Cloudflare should ban reddit.

von_lohengramm · on March 21, 2022

> But to me, a policy of content neutrality is tacitly supporting extremist content.

When I read this, I agreed, but as I continued I realized that we come to opposite conclusions. A platform that espouses neutrality and fairness _should_ support extremist content, but it should be the reader's job to determine if the content is despicable.

matt-attack · on March 21, 2022

Is it ok if I don’t like the term “reverse proxy”?

I find it entirely confusing and non-intuitive. I put it up there with idiotic terms like “OTT” which AFAIK just means “connected to the internet”.

jgrahamc · on March 21, 2022

Has always bothered me also, but that's the industry term so we are kind of stuck with it.

simmervigor · on March 21, 2022

Proxy components are officially called Intermediaries in thr HTTP semantic specification; see https://httpwg.org/http-core/draft-ietf-httpbis-semantics-la....

Intermediaries can have different purposes. The official alternative to reverse proxy is "gateway", which is unfortunately overloaded with other kinds of gateways in networking.

Naming things is hard. Reverse proxy isn't great but all things considered is unique enough to allow folks to discriminate the sort of HTTP proxying that is happening

westurner · on March 22, 2022

An HTTP reverse proxy forwards HTTP requests and adds e.g. X-Forwarded-For and X-Forwarded-Host headers.

https://www.nginx.com/resources/wiki/start/topics/examples/f... :

  X-Forwarded-For: 12.34.56.78, 23.45.67.89
  X-Real-IP: 12.34.56.78
  X-Forwarded-Host: example.com
  X-Forwarded-Proto: https

TIL from the nginx docs that there's a standardized way to forward HTTP without the X- prefix on the unregistered headers:

  Forwarded: for=12.34.56.78;host=example.com;proto=https, for=23.45.67.89

What is the difference between a reverse proxy and a load balancer?

k8s calls this "Ingress" and there are multiple "Ingress API" implementers; which essentially must reload the upstream server list on SIGHUP. https://kubernetes.io/docs/concepts/services-networking/ingr...

List of k8s Ingress Controllers: https://kubernetes.io/docs/concepts/services-networking/ingr...

dilyevsky · on March 22, 2022

> What is the difference between a reverse proxy and a load balancer?

Reverse proxy may or may not loadbalance requests. For example, in a sidecar configuration it can just terminate tls, provide telemetry, etc and forward everything to local port.

westurner · on March 22, 2022

A [load-balancing] reverse proxy can also keep WAF rules in RAM for processing requests and responses. WAF: Web Application Firewall (OWASP CRS ruleset, CF ruleset,)

Methods for delegating HTTP requests to another application, with per-message overhead and inevitably-necessarily-tunable buffering: Layer 2 (MAC on a local segment), Layer 3 (IP), Layer 4 (TCP, UDP ports), Layer 7: HTTP parse and forward over network sockets or file sockets, defy separation of concerns and least privileges and run the (e.g. non-blocking Lua,) app within the webserver, Layer 7+: container service mesh Ingress API,

e.g. FastCGI uses file sockets, which avoids additional TCP overhead but doesn't really scale because sockets and network filesystems.

(ASGI is the Asynchronous WSGI, which specifies $ENVIRONMENT_VARIABLE names as an interface contract in order to decouple web [[reverse] proxy] servers from web applications.)

Fundamentally, which variables passed in the e.g. os.environ dict like $REMOTE_USER and IDK is it like $SSL_CLIENT_CERT_SHA384, SSL_CLIENT_CERT_*; should downstream web applications simply trust as valid strings over what network path?

TLS re-termination.

Non-root [web] servers must run on ports less than 1024, which e.g. iptables or nftables (or eBPF) can easily port-forward to only if rewriting URLs within potentially-signed assets within HTTP messages and HTTP/3 UDP streams isn't necessary.

yjftsjthsd-h · on March 21, 2022

What would be a more intuitive term for it?

unethical_ban · on March 21, 2022

Server-side proxy, or inbound proxy, perhaps.

latchkey · on March 21, 2022

It would be nice if there was a way to more easily debug caching issues with CF workers being used as proxies. There is very little visibility into how headers affect things and it is poking in the dark to make it all work.

kazinator · on March 22, 2022

I haven't paid much attention to this stuff, but recently I discovered a trick for slowly migrating the web services on an old server to a new one: via reverse proxy definitions on the new server that pass through selective paths to the old one. To the outside, it looks like one server. The new server has the SSL identity with up-to-date crypto and all and terminates the HTTPS connections. As I'm doing this, suddenly it hits me: oh so this is Cloudfare in a mini nutshell?