We have many layers of protection:
* We run iptables and an api we wrote on our ingest servers. We run failtoban on a separate set of servers. When fail2ban sees something, we have it hit the api and add the iptables rules. This offloads the cpu of failtoban from our ingest servers.
* We block groups of known hosting company IP blocks, like digital ocean and linode. These were common sources of attacks.
* Our services all have rate limits which we throttle based on IP
* We have monitoring and auto-scaling which responds pretty quickly when needed. And has service level granularity.
* Recently moved behind cloudflare because google cloud did not protect us from attacks like the UDP floods which didn't even reach our servers.
If they attackers are persistent, there is really no way to guarantee zero down time. THEY WILL FIND A WAY. Just make sure your stake holders know you are doing everything in your power to resolve the issues, and then actually do those things.
We had been seeing DDOS attacks for a few weeks, so we had most everything locked down and working. But then suddenly one of the most important parts of our site started going down under load. That part is a real time chat system. We looked for which chat room had the load and it was one which did not require a user be registered. We switched the room into registered users only mode and thought we had solved it.
About 5 minutes later the attack came back with all registered users. We were amazed, becuase there is no way the attackers could have registered that many accounts in 5 minutes because of our rate limiting on that. Turns out that they had spend the past week or so registering users in case they needed them :)
We have some controversial users...
curl https://184.108.40.206 -H 'Host: www.stream.me' -v -k
Btw, check out curl's --resolve flag. You can use it to override default DNS resolution and can then drop the -k flag.