
Abusing Linux's firewall: the hack that allowed us to build Spectrum - jgrahamc
https://blog.cloudflare.com/how-we-built-spectrum/?a
======
mrb
I built an HTTP service that listens on all 65535 TCP ports and tells you
which port you connected to (very useful to diagnose which outbound ports are
firewalled by ISPs or by Wifi networks):

[http://open.zorinaq.com/](http://open.zorinaq.com/)

The folks at Cloudflare have done it with an iptables TPROXY rule (which
requires the socket to have the IP_TRANSPARENT option) which is how I did it
too. But there is another way to do this in Linux: you can use an iptables
REDIRECT rule, and the userspace program can obtain the original destination
port by doing a getsockopt() call to read SO_ORIGINAL_DST.

Edit: oh I see now the blog post does mention the REDIRECT & SO_ORIGINAL_DST
option, but criticize its performance... which makes sense given its
dependence on conntrack.

There is a typo in Cloudflare's blog post: s/SO_TRANSPARENT/IP_TRANSPARENT/

~~~
jiveturkey
neat. but why do you need a service at all to detect blocking? you can use
timing to also do it easily without the need for any server component at all.

perhaps this works poorly for firewalls near to the service but you declared
the problem to be one close to the client. AIUI

~~~
mrb
When an ISP blocks certain ports by dropping the SYN packet, the client sees a
time out. There is nothing to "time" that can prove it's the ISP dropping it.

~~~
jiveturkey
yes there is. When you don't get a RST back at the expected time (say *2), you
know SYN was dropped. Are you arguing that it could be packet loss? You
address that by taking multiple samples, and by comparing against loss to
ports that you get ACK back from.

~~~
mrb
« _you know SYN was dropped_ »

But you don't know who dropped it: the ISP or the remote server. In order to
show it's the network between the client and server dropping it, you need a
server that behaves in a known way, hence open.zorinaq.com I used to work in
the InfoSec industry, running port scans from various locations, and
open.zorinaq.com was incredibly useful to ensure there was no random firewall
preventing us from finding certain open ports. That was the primary motivation
why I built the service.

------
majke
Author here. TPROXY module is pretty special, it really would have been hard
to handle any inbound port without it. I guess it shows that there are
benefits in keeping firewall and network stack code tied close.

~~~
zng00
This is great, thanks for sharing. I'm curious about the downstream proxy
process (i.e. ::1234) and how you scale it and balance load across multiple
instances of the process. You can't really use iptables to load balance your
processes as either the DNAT or REDIRECT mechanism will modify the destination
address, right?

Ex. # TPROXY directs all traffic to :1234, and these rules load balance to 4
different processes

iptables -t nat -I OUTPUT -p tcp -o lo --dport 1234 -m state --state NEW -m
statistic --mode nth --every 4 --packet 0 -j DNAT --to-destination
127.0.0.1:8080

iptables -t nat -I OUTPUT -p tcp -o lo --dport 1234 -m state --state NEW -m
statistic --mode nth --every 4 --packet 1 -j DNAT --to-destination
127.0.0.1:8081

iptables -t nat -I OUTPUT -p tcp -o lo --dport 1234 -m state --state NEW -m
statistic --mode nth --every 4 --packet 2 -j DNAT --to-destination
127.0.0.1:8082

iptables -t nat -I OUTPUT -p tcp -o lo --dport 1234 -m state --state NEW -m
statistic --mode nth --every 4 --packet 3 -j DNAT --to-destination
127.0.0.1:8083

~~~
majke
We have a single Accept queue for all the ports. For TCP it doesn't create any
problems - the new connection rate is rarely significant.

For the accept-queue load balancing see these blog posts:

[https://blog.cloudflare.com/the-sad-state-of-linux-socket-
ba...](https://blog.cloudflare.com/the-sad-state-of-linux-socket-balancing/)

[https://blog.cloudflare.com/syn-packet-handling-in-the-
wild/](https://blog.cloudflare.com/syn-packet-handling-in-the-wild/)

~~~
zng00
Wow, these are some great resources. Thanks for sharing! I have a call with
one of your colleagues in 5 minutes ;)

------
dboreham
Not sure the headline is accurate: surely these kernel mechanisms were
invented specifically _to_ allow this functionality? Therefore there is no
abuse. More like “we found a mostly-forgotten netfilter feature designed to do
the thing we’re trying to do, so we used it”.

~~~
jchw
Not exactly. TPROXY is designed for transparent proxying, but by way of the
mechanism it works, can also be used to approximate binding to all TCP ports.
The latter use case is a bit different.

~~~
aidenn0
but they are binding all TCP ports to implement a transparent proxy, no?

~~~
nikanj
Transparent _reverse_ proxy, which very likely was not the originally intended
use case.

------
ttul
TPROXY is totally amazing. We used it to modify nginx to create a transparent
SMTP proxy that scales. Using TPROXY, we can pretend to be millions of ISP
subscriber IPs at once in a single process.

~~~
jlgaddis
In the near future, I'll need to do something likely very similar to what you
did (albeit, probably on a smaller scale). Are there any technical details
about this that you can share or perhaps just some pointers to relevant and/or
helpful documentation?

(N.B.: I won't even be starting on this for probably a month or two so I
haven't even begun to look into it. If there is documentation easily/readibly
available via a Google search (i.e., I'll find 'em as soon as I Google for
'em) then just ignore my request. Thanks!)

~~~
ttul
There is a TPROXY mailing list where you can easily get questions answered by
the community if not the original author of the patch.

This Python example sets up a transparent HTTP proxy which will show you the
basic socket stuff you need to get going.

[https://github.com/erijo/transparent-
proxy/blob/master/READM...](https://github.com/erijo/transparent-
proxy/blob/master/README.md)

------
sciurus
Discussion for the Spectrum product:
[https://news.ycombinator.com/item?id=16820631](https://news.ycombinator.com/item?id=16820631)

------
freedomben
What happens to this once NFTables takes over? I'm still using iptables in
production, but I'm wary since my understanding is that iptables is sort of
deprecated in favor of NFTables

~~~
iakie
nftables doesn't support TPROXY.

~~~
freedomben
Right, TPROXY is an iptables module (which implies that without someone to
port it (assuming porting is even possible due to architectural differences),
it isn't going to work on NFTables).

To clarify my original question, what will cloudflare do if/when iptables
finally goes away? Has thought been put into it? Will they implement their own
type of TPROXY? Will they continue to support iptables themselves? There's
quite a few paths, and I'm interested in which one they deem most optimal
because I respect their opinions a lot.

~~~
iakie
actually, TPROXY is very very lightly coupled with iptables. In fact, you can
directly use TPROXY without iptables.

here's a 50 line kernel module that uses TPROXY to do the samething without
touching iptables.

[https://pastebin.com/uxUf6MFS](https://pastebin.com/uxUf6MFS)

looking at the nftables code, I think the only reason nftables doesn't support
TPROXY is that no one wrote some of the config parsing / seralization stuff.

~~~
jrochkind1
Sounds like cloudflare might want to start trying to submit some nftables
TPROXY support now, so it's there in the vanilla kernel when they end up
needing it. :)

~~~
RandomBK
It'd expect someone to eventually submit such a patch, though I don't know how
urgent this issue is. Iptables isn't going anywhere anytime soon, so
Cloudflare can continue to use this method on the edge nodes.

------
riobard
What's the problem with SO_ORIGINAL_DST? Could you please explain a bit why
the code is not encouraging? The author of TPROXY also mentioned somewhere
else that SO_ORIGINAL_DST is racy, but I'm not a kernel developer and don't
understand why. Thanks!

~~~
riobard
Digging deeper, I found more explanation on StackOverflow
[https://stackoverflow.com/a/5814636/184061](https://stackoverflow.com/a/5814636/184061)
(seems to be written by tproxy author Balazs Scheidler judging by the
username).

------
zaarn
I hope this eventually becomes available to everyone (even if in a limited
fashion).

Being able to setup a Gitlab/Gitea server behind Cloudflare without having to
hack around the SSH port limitation would be fun.

------
madez
> For completeness, there is also a sysctl net.ipv6.ip_nonlocal_bind, but we
> don't recommend touching it.

Any any ideas for an explanation of this recommendation?

------
kazinator
> _Well, we can 't ever know what the world looks like through another
> species' eyes_.

s/species'/person's/

