
Ask HN: How is DDoS protection implemented? - elephant0xffff
The big services (Google, Cloudflare, etc.) provide DDoS attack mitigation (and seem to succeed), but details on their tactics are rare (at least I did not find in-depth information on that).<p>I guess to make this work well you have to do classification (regular request vs. malicious) on several protocol layers and then reroute or drop packets accordingly. But how does that prevent severe service degradation - you still have to do some kind of work (in computation and energy) on the listening side or can fat edge-servers just eat that up?
======
tptacek
I was lead developer on Arbor Network's DDoS product in the early 2000s (I
left in 2005 to start Matasano Security). My information on this is surely
dated, but people seem to still be using the same terminology now as then.

You can break down DDoS into roughly three categories:

1\. Volumetric (brute force)

2\. Application (targeting specific app endpoints)

3\. Protocol (exploiting protocol vulnerabilities)

DDoS mitigation providers concentrate on 1 & 3.

The basic idea is: attempt to characterize the malicious traffic if you can,
and or divert all traffic for the target. Send the diverted traffic to a
regional "scrubbing center"; dirty traffic in, clean traffic out.

The scrubbing centers buy or build mitigation boxes that take large volumes of
traffic in and then do heuristic checks (liveness of sender, protocol
anomalies, special queueing) before passing it to the target. There's some in-
line layer 7 filtering happening, and there's continuous source
characterization happening to basic network layer filters back towards
ingress.

You can do pretty simple statistical anomaly models and get pretty far with
attacker source classification, and to track targets and be selective about
what things need to be diverted.

A lot of major volumetric attacks are, at the network layer, pretty
unsophisticated; they're things like memcached or NTP floods. When you're
special-casing traffic to a particular target through a scrubbing center, it's
pretty easy to strip that kind of stuff off.

~~~
pvg
What happens when it doesn't work? For instance why does something like Mirai
happen? The first D is too D?

~~~
verroq
You call up krebs and the FBI and they'll dox/arrest the attacker.

~~~
pvg
No.

------
jedberg
I worked on the eBay DDOS prevention system in the early 2000's. My coworkers
filed a patent on part of the system.

[https://patents.google.com/patent/US7992192](https://patents.google.com/patent/US7992192)

Once the traffic was detected, the signature was sent to a second system that
was a series of hardware optimized for layer 7 packet inspection. The devices
were updated with signatures of current attacks, and then checked every
incoming packet for that signature. Any packet that matched was parsed for
where it was coming from, and then the router was updated to drop traffic from
that source for a period of time.

As far as I know, today's techniques are fairly similar, along with just
having a whole lot of computers that can absorb the traffic.

~~~
grepthisab
What does a "signature" look like specifically, or generally if you can't be
specific? Would love to hear about what is actually getting sent to the L7
optimized hardware.

~~~
tptacek
In the early 2000s you could get a long way with just the 5-tuple, some basic
aggregation inference, and a RRD histogram. The tricky parts were having the
ability to divert and process the traffic once characterized. The actual
processing wasn’t that complicated; it just needed way bigger rules than could
be fit in a switch TCAM.

------
AgentK20
By far the biggest part of attack mitigation in my experience is out-scaling
the attack. A well written and configured application stack can handle a
decent amount of traffic itself before becoming bogged down processing
malicious traffic, but at some point you'll cap out either the application,
the NIC, the upstream switch, the router, or the ISP line, if your application
is running in just one place. To get around that, huge providers like the ones
you listed are heavily multi-homed. This means they announce their traffic
routes to the internet from multiple locations, so traffic naturally flows to
the closest (hops wise, not necessarily geographically speaking) endpoint.

From there, you can add layers of protection ranging from simple things like
blocking traffic that is obviously malicious (TCP flags, port numbers, etc) to
more complex things like pattern recognition in both the overall trends of the
data and on a per-packet basis. After you've decided with a decent certainty
that it's not malicious traffic, you pass it off to the actual backend
service.

For systems that are designed to scale horizontally, that may be a neighboring
machine (or even the same machine) in that data center. For single-homed
backend systems that can't scale horizontally to multiple locations, that
"clean" traffic is then sent via some mechanism (possibly a GRE tunnel,
possibly just raw internet traffic to a secret IP) to the backend service.
Depending on the methodology used, the filtering may be a true bidirectional
proxy, in which case the reply goes back to the scrubber and then out to the
original sender, or it may be a unidirectional proxy, in which case the reply
goes directly back to the original sender.

All attack mitigation works in some way like this, whether it be by designing
your application from the beginning to be multi-homed and able to run in
multiple datacenters, or by installing a separate mitigation layer that scrubs
attack traffic.

------
bArray
From my personal low-end server perspective (which has stood up to simple
attacks from Russian IPs), I have the following:

1\. Static page caching (in RAM ideally) - dynamically generated content will
kill you quicker than anything else, especially calls to a database. WordPress
is very easy to kill in it's default state.

2\. Kill high frequency requests from the same location as quickly as possible
(make sure your response is less than the data they send you - ultimately you
want their systems to be busier than yours). You want to free the port up as
quickly as possible.

3\. Move anybody you can identify as a legitimate user (credentials, low
frequency requests) out to another server if possible.

Firewall wise, my system sits on the cloud, so usually high frequency traffic
is the only issue I have to deal with. Interested to hear any advice of other
people here.

~~~
bo1024
For #2 -- how do you "kill" high frequency requests? By ignoring them?

~~~
colanderman
Yep. Add the source address (or some more specific yet easily computed
identifier) to a table that is checked early in the network path (in hardware
if possible).

Or, if you want to be fancy, "tarpit" them (complete TCP handshake and _then_
ignore, forcing attacker to actually commit resources), but apparently that's
of questionable value these days. [1]

[1]
[https://en.wikipedia.org/wiki/Tarpit_(networking)](https://en.wikipedia.org/wiki/Tarpit_\(networking\))

~~~
bo1024
Interesting, thanks.

------
oneplane
It's essentially still the same thing: having the bigger pipe.

A distributed DoS attack has many sources, and when including botnets on
infected consumer systems you have legitimate source addresses/devices as
well. This defeats most "blackhole the source" options as the source is the
same thing as legitimate visitors/customers.

So for a DDoS that simply tries to saturate your link(s) and where you can't
blackhole the source, the only 'protection' is having more bandwidth than the
attacker(s) has (or have).

After that a few other things come in to play, attack-traffic from legit
sources may have a pattern, so while you can't blackhole upstream, you can
prevent traffic with a pattern to get to the actual application/site. This is
relevant in cases where you might suffer from application overload before link
overload. If your link can handle the DDoS traffic but your application can't,
you're still screwed. (and with application I include load balancers,
databases, storage etc.)

------
gbrayut
Fastly had a good presentation about ddos trends and how they mitigate them at
one of their recent Altitude conferences. Video at
[https://vimeo.com/212305516](https://vimeo.com/212305516) and the mitigation
stuff starts around 14:45

------
NightlyDev
The easy answer: Load balancing

Anycast is the most important piece of the puzzle, allowing you to route
traffic to a bunch of different locations.

Let's say you can handle 10 Gbps at a single location. If the traffic is
evenly split between 100 destinations then you can have a single IP that can
handle 1 Tbps of traffic.

Of course, the setup behind these IPs might vary a lot, and one might even use
DNS load balancing in front of the IPs.

~~~
nkozyra
Load balancing is in place for all but the most trivial sites, though, so what
you're really saying is horizontal scaling. Which is fine but expensive
compared to pattern based mitigation techniques.

~~~
kpcyrd
I don't think this is about regular load balancing. DDoS is coming from a
large number of infected machines, but they can't control how their traffic is
routed. By using anycast you're splitting the machines that are used to attack
into small groups that your pattern based mitigation or even your regular
reverse proxies can handle.

~~~
penagwin
CDN networks are well equipped for this because of their large geographical
footprint. If they can terminate "bad" requests closer to their origin then
they don't add up nearly as badly for the application server.

------
rdl
One "trick" to know is that transit links are generally billed as the higher
of inbound or outbound traffic. If you have a service which is unbalanced and
pushing out a lot of data (like most hosting services), your inbound is thus
essentially "free" up to a very high volume.

------
togusa2017
This might give you a nice idea of how HAProxy provides the feature.
[https://www.haproxy.com/blog/use-a-load-balancer-as-a-
first-...](https://www.haproxy.com/blog/use-a-load-balancer-as-a-first-row-of-
defense-against-ddos/)

------
majke
I'm working on DDoS protection at Cloudflare. AMA

We try to publish most of what we do, the more obvious links:

[https://blog.cloudflare.com/how-cloudflares-architecture-
all...](https://blog.cloudflare.com/how-cloudflares-architecture-allows-us-to-
scale-to-stop-the-largest-attacks/)

[https://blog.cloudflare.com/meet-gatebot-a-bot-that-
allows-u...](https://blog.cloudflare.com/meet-gatebot-a-bot-that-allows-us-to-
sleep/)

[https://blog.cloudflare.com/the-root-cause-of-large-ddos-
ip-...](https://blog.cloudflare.com/the-root-cause-of-large-ddos-ip-spoofing/)

[https://blog.cloudflare.com/memcrashed-major-
amplification-a...](https://blog.cloudflare.com/memcrashed-major-
amplification-attacks-from-port-11211/)

[https://blog.cloudflare.com/syn-packet-handling-in-the-
wild/](https://blog.cloudflare.com/syn-packet-handling-in-the-wild/)

[https://blog.cloudflare.com/reflections-on-
reflections/](https://blog.cloudflare.com/reflections-on-reflections/)

[https://blog.cloudflare.com/say-cheese-a-snapshot-of-the-
mas...](https://blog.cloudflare.com/say-cheese-a-snapshot-of-the-massive-ddos-
attacks-coming-from-iot-cameras/)

[https://blog.cloudflare.com/the-new-ddos-
landscape/](https://blog.cloudflare.com/the-new-ddos-landscape/)

[https://blog.cloudflare.com/unmetered-
mitigation/](https://blog.cloudflare.com/unmetered-mitigation/)

[https://blog.cloudflare.com/introducing-the-p0f-bpf-
compiler...](https://blog.cloudflare.com/introducing-the-p0f-bpf-compiler/)

And maany more.

Also two talks:

[https://idea.popcount.org/2016-02-01-enigma---building-a-
dos...](https://idea.popcount.org/2016-02-01-enigma---building-a-dos-
mitigation-pipeline/)

[https://idea.popcount.org/2015-11-16-black-hat-eu---
defendin...](https://idea.popcount.org/2015-11-16-black-hat-eu---defending-
the-indefensible/)

> But how does that prevent severe service degradation

It doesn't. You DROP the most specific thing you can. To avoid collateral
damage we are able to do "Scattering" (move client across IPs with the hope
the attack won't follow), and for example apply the controversial limits only
in certain geographical areas (anycast network allows this).

> you still have to do some kind of work (in computation and energy) on the
> listening side

Yes. BPF for L3 works like charm. Read on XDP.

> or can fat edge-servers just eat that up?

Yes and no. You have to specifically optimize, whatever you do probably won't
make Apache or IIS work under DDoS. Most vendors use "scrubbing centres", when
they can have small number of beefy dedicated servers. We didn't find this
architecture sufficient though, so in our case edge servers do handle the
load. But we do spend time on tuning the servers and our applications.

------
amorphid
One way these companies mitigate DDoS attacks is by being huge. If you have a
small house w/ one entrance, there's no great way to manage 1000 people trying
to get through the front door. If you have a huge house w/ dozens of
entrances, dealing with 1000 people trying to get in the building is much more
manageable :)

From
[https://en.wikipedia.org/wiki/DDoS_mitigation](https://en.wikipedia.org/wiki/DDoS_mitigation):

 _One technique is to pass network traffic addressed to a potential target
network through high-capacity networks with "traffic scrubbing" filters._

~~~
oneplane
But if all those people get in and all try to get into the single elevator in
the building it will be a problem :p (link DoS vs. application DoS)

------
contingencies
Not an expert but I would guess at least the following: traffic filtering,
peer traffic filtering by (possibly dynamic and automated) agreement, traffic
classification and anomaly detection (DNS/TCP/HTTP(S)/etc.), routing different
clients (based on origin AS and/or geolocation) to different IPs through DNS,
hosted web frontends, web-level active user challenges, potentially
dynamically altering the advertisement of routes, and by charging so much
money the moment you need to use them that buying extra bandwidth and
netblocks isn't an issue for them. Probably some of them also drop to high-
overhead traffic reduction modes which can expand frontend IPs and DNS
response segmentation, dropping DNS TTLs and spinning up new proxy systems in
order to better filter out robotic attackers. Many also probably
create/profile/buy various browser fingerprinting techniques and may have a
library of non publicly disclosed approaches available for additional
mitigation during high bandwidth attacks. Oh yeah, and replicating a static
cache as a cheap means of degraded service provisioning.

------
jgrahamc
We've open sourced and talked about a lot of how we do DDoS mitigation.
Details are in the following blog posts:

No Scrubs: The Architecture That Made Unmetered Mitigation Possible -
[https://blog.cloudflare.com/no-scrubs-architecture-
unmetered...](https://blog.cloudflare.com/no-scrubs-architecture-unmetered-
mitigation/)

Meet Gatebot - a bot that allows us to sleep -
[https://blog.cloudflare.com/meet-gatebot-a-bot-that-
allows-u...](https://blog.cloudflare.com/meet-gatebot-a-bot-that-allows-us-to-
sleep/)

How Cloudflare's Architecture Allows Us to Scale to Stop the Largest Attacks -
[https://blog.cloudflare.com/how-cloudflares-architecture-
all...](https://blog.cloudflare.com/how-cloudflares-architecture-allows-us-to-
scale-to-stop-the-largest-attacks/)

Kernel bypass - [https://blog.cloudflare.com/kernel-
bypass/](https://blog.cloudflare.com/kernel-bypass/)

SYN packet handling in the wild - [https://blog.cloudflare.com/syn-packet-
handling-in-the-wild/](https://blog.cloudflare.com/syn-packet-handling-in-the-
wild/)

How to achieve low latency with 10Gbps Ethernet -
[https://blog.cloudflare.com/how-to-achieve-low-
latency/](https://blog.cloudflare.com/how-to-achieve-low-latency/)

How to receive a million packets per second -
[https://blog.cloudflare.com/how-to-receive-a-million-
packets...](https://blog.cloudflare.com/how-to-receive-a-million-packets/)

Introducing the BPF Tools - [https://blog.cloudflare.com/introducing-the-bpf-
tools/](https://blog.cloudflare.com/introducing-the-bpf-tools/)

BPF - The Forgotten Bytecode - [https://blog.cloudflare.com/bpf-the-forgotten-
bytecode/](https://blog.cloudflare.com/bpf-the-forgotten-bytecode/)

Introducing the p0f BPF compiler - [https://blog.cloudflare.com/introducing-
the-p0f-bpf-compiler...](https://blog.cloudflare.com/introducing-the-p0f-bpf-
compiler/)

Single RX queue kernel bypass in Netmap for high packet rate networking -
[https://blog.cloudflare.com/single-rx-queue-kernel-bypass-
wi...](https://blog.cloudflare.com/single-rx-queue-kernel-bypass-with-netmap/)

------
WhiteSource1
You can see a DDoS attack live for a demo of how it works:
[https://www.youtube.com/watch?v=FIQUUFVE6tU](https://www.youtube.com/watch?v=FIQUUFVE6tU)

They are also doing a webinar (apologies for the link) so you can see exactly
how it's implemented: [https://www.incapsula.com/blog/want-to-see-what-a-live-
ddos-...](https://www.incapsula.com/blog/want-to-see-what-a-live-ddos-attack-
looks-like.html)

------
vlan0
BGP Flowspec is commonly used ISPs. Sadly, they won’t extend that to their
customers.

They’d rather sell yet another service rather than supporting open protocols.

~~~
majke
There are two! There are two ISP's that allow customers to send Flowspec to
their backbone!
[https://twitter.com/flockforward/status/909090299724664832](https://twitter.com/flockforward/status/909090299724664832)

------
trelliscoded
I use remote black hole routine announcements to the upstream ISPs to filter
source or destination addresses from traversing the congested link.

~~~
exikyut
I wonder where you are, very vaguely speaking. (Just in case people might be
near (downstream of) you unaware that they could take advantage of these
announcements.) Maybe this is a tricky question (because of domain
nontriviality, or because of privacy), which is fine.

PSA: this user's profile definitely deserves reading, everyone go look

------
yeukhon
I wonder if anyone has ever tried counter attack. The downside is in turn DoS
the origin, which often are victims like infected host in a botnet. Double-
edged sword. But it would be very interesting to see how quickly one could
defeat the attack.

I also wonder why attack often last only a few hours.

~~~
Analemma_
1\. That would be just as illegal as the original attack; cybersecurity laws
have no provisions for self-defense. (It's true that nations are attempting to
negotiate clauses like that in international relations, but even if that pans
out, it will definitely never be a privilege afforded to individuals)

2\. Attack what? It's a _distributed_ DoS, the calls are coming from all over.
You mean going after every node sending traffic? What would "attacking them"
even mean? It's not like you can shut them down.

3\. All those nodes are innocent and being used unknowingly. Attacking them
would be both illegal (see point 1) and pretty unethical: you're deliberately
aiming at innocents and not the attacker (whom you have no chance of
locating). Imagine if you took down a hospital attempting to stop an NTP flood
on your dumb blog. Have fun explaining why that was necessary.

"Counter-hacking" sounds cool and sexy, but there are reasons why it is never
done.

~~~
zerostar07
I know next to nothing about botnets, but i wonder if you could divert traffic
from botnets to a fake server that does nothing other than trying to keep the
connection open for as long as possible (or being super slow in general) in
order to increase the number of open connections from the bot's side, in order
to stall it from opening new connections or make it slow in general.

~~~
yeukhon
Usually one would setup a sinkhole to divert traffic away from real hosts, but
I think to keep these connections running they are usually just sending
packets and disconnect. smurf attack and SYN attack are very classic.

