
Jerks on the Internet: what my first DDoS taught me - sergiomattei
https://sergiomattei.com/posts/handling-the-jerks/
======
Animats
If you have some kind of expensive request, use fair queuing by IP address. If
someone has a request pending, more requests from the same source go behind IP
addresses with fewer requests. So each IP address competes with itself, not
others.

For some reason, this isn't done much. I have it on a site of mine. I didn't
notice for a week that someone was making a huge number of requests and not
even waiting for the task to complete. It didn't hurt anything.

~~~
veryworried
It’s not used because any serious attack is going to come from multiple
unrelated sources, think a botnet full of compromised IoT devices hitting your
server with 20TB a second worth of requests. So you might as well plan for
that scenario instead.

~~~
Animats
Those requests seldom get far enough to start significant server activity.
It's the ones that look like legit requests that are the problem.

~~~
veryworried
Seldom isn’t good enough. When it comes to security you have to be right 100%
of the time. An attacker only has to be right once. Good luck.

~~~
montenegrohugo
That's not true. Security is always a trade-off between effort invested and
probability of a possible breach.

There is no 100% secure system.

------
jmchuster
Hope 10% of this might be useful to you.

1) the three most important metrics for any endpoint are error rate, lantency,
throughput. So i hope you've learned to not be surprised that abnormal
throughput (either via ddos attacks or friendly n+1 queries) is a common error
condition.

2) banning ip addresses is useless and often counter productive. If possible,
short-circuit requests from an ip so you can isolate them and then determine
how much damaging knowledge they know (while letting them think they're still
hitting your real service)

3) feature development is a great goal, but don't forget that the most
important feature is availability. Spending an extra 30 minutes to consider
things like pagination and so on, end up being worth it if you think you might
get attacked more than 0 times per year.

~~~
kristiandupont
>short-circuit requests from an ip

What does that mean?

~~~
zoul
I’d guess serving them a static placeholder or cached result to prevent them
from hammering the DB?

------
yoongkang
Good steps to take in the article. But I'd also add that Django REST
Framework, which they seem to be using, has throttling capabilities built-in
which I would have attempted before changing the API: [https://www.django-
rest-framework.org/api-guide/throttling/](https://www.django-rest-
framework.org/api-guide/throttling/)

Adding pagination seems reasonable, but it may have broken clients who didn't
expect pagination to be there.

~~~
sergiomattei
Hi! Thanks for the recommendation, I did go with that approach

------
kuroguro
Consider limiting HTTP access to Cloudflare's IP range. Looking up DNS history
reveals the real IP address for direct attacks.

    
    
        curl -k https://134.209.46.107/products/ -H "Host: api.getmakerlog.com"

~~~
jakejarvis
Seconded. Argo is also great and makes this even easier (for $5/month).

[https://www.cloudflare.com/products/argo-
tunnel/](https://www.cloudflare.com/products/argo-tunnel/)

------
dgudkov
>With growth also come the assholes

That pretty much describes the whole history of internet.

~~~
jjgreen
... the whole history _of our species._

------
megablast
> Therefore, when requesting the endpoint, a massive SQL request would be
> made, freezing the server while the items were fetched + serialized into
> JSON (a Django REST Framework performance weak point).

No caching?

~~~
adventured
I was surprised to not see that in the things I will do / fix list. Caching
those JSON API endpoints would have dramatically changed the ability to absorb
that DDoS. If merely adding pagination brought it back to functioning (from
100% CPU to ~60%), it wasn't a very large attack and caching would have
trivially handled it. Either way, even with Cloudflare and pagination, they
should prioritize adding caching on the API at some point in the near-term.
The relief on the database will be considerable and it'll buy a lot of API
usage growth runway at almost no cost.

Since they're already using Nginx, if they don't want to bother with learning
anything else, it's a couple of hours of research to learn how to set up rock
solid basic caching using Nginx. It'll quickly get you 85% of the way on
caching, until you need something better. Set Nginx loose to do one of the
things it's very good at.

------
rlue
Can anyone shed some light on why someone would go out of their way to conduct
an attack like this? Is DoSing production web applications just a hobby for
black hat jackasses with nothing better to do?

~~~
derpherpsson
It's fun. Or maybe you are angry. Or maybe you just want to test to break it
down. Or maybe you just want the programmers to feel sad, maybe because they
were boring

~~~
saiya-jin
Fun in a same way as trashing your neighbor's car just because you like to see
sparks and glass shards flying all around. Rather an indication of a sad
frustrated life

~~~
heavenlyblue
>> to see sparks and glass shards flying all around

Well, no. If you _only_ liked sparks and glass shards flying all around -
you'd buy yourself a car and destroyed it yourself.

Destroying the car of a neighbour implies something going between you and the
neighbour.

------
paulie_a
>which hosts other in-development apps too

Don't do that

>I trust my users completely.

Don't do that

>Prioritize bugfixes over new features

while that is a nice thought, it is unlikely to be so simply followed. "the
road to hell is paved with good intentions"

>people editing other’s tasks for example, haha

That's a very cavelier attitude to take. Quite frankly that should have been
baked in from the get go with tests to verify.

You see this situation as someone being a jerk. Someone could have
accidentally done the same thing due to the lack of planning on multiple
levels.

~~~
sergiomattei
It was actually all unit tested - bugs happen though, and that one was a
particularly nasty one.

It was patched though and there was no evidence that anyone ever used it.

------
eastdakota
Glad we at Cloudflare could help!

~~~
eeeeeeeeeeeee
I will also say Cloudflare has saved sites I’ve managed numerous times. I’ve
been null routed by major data centers for a few Gbps and they wouldn’t give
us any options. It was always “wait.”

Most of the DDoS providers in this space are insanely expensive so I’m very
glad Cloudflare has existed!

------
iforgotpassword
I hope another takeaway from this was to check at a lower level much sooner.
At least the way it reads you spent quite some time suspecting your app being
at fault or the tech stack goofing out. Htop should have shown high cpu usage
from the db process right away, traffic was probably more than usual; access
logs are always a good thing to check too.

------
lewilewilewi
The lesson here appears to be that the balance between feature development and
fixing technical debt isn't obvious - these things are measured historically
and you only know that you've got it wrong after the fact. In fact, if you
experience the exact same codebase and don't suffer the ddos, did you get the
balance right after all?

------
MerlinW
-A INPUT -p tcp --dport <port> -m limit --limit 25/minute --limit-burst 100 -j ACCEPT

------
pravda
Why not just block all visitors with user agent curl?

Let me just say, that's a really dumb DDOS attack.

Edit: one other thing, we spell it _psych_ and not _sike_.

~~~
astura
>we spell it psych and not sike.

I have been informed by someone a generation younger than me that "the kids"
are intentionally spelling it "sike" these days.

Wikionary lists it as a variant of "psych."

[https://en.wiktionary.org/wiki/sike](https://en.wiktionary.org/wiki/sike)

~~~
pravda
oh..gosh. well, that makes sense, in a way.

I confess I did have to google how to spell 'psych' properly. started from
'pysch'.

~~~
astura
I always was sure it was spelled psych, since it's basically short for
"psyched out" and I had always seen it spelled that way.

But the topic recently came up in conversation with my teenaged niece, she
claims it's definitely spelled "sike" even though she's fully aware of the
etymology.

I chocked it up to a "kids these days" generational kinda thing, like how we
got "phat" in the 70s and "kewl" in the 90s, both of which are now in the
Oxford English Dictionary.

~~~
kurtisc
>I chocked it up

chalked it up ;)

------
yyx
Can anyone recommend monitoring solutions to help identify these issues?

~~~
jeffshek
1\. CloudFlare. That's like the first thing I normally do for any new project.
(which he then switched to).

2\. Your clients should not be the first one to alert your site is down.
Pingdom does a great job to alert you before your customers/users do.

3\. The author brought up some of the queries weren't paginated and running an
expensive SQL query, so there's a few options. Since he's using Django there's
some Django specific options in this list.

A) Implement a backend cache that will return back the JSON query (throw it on
a redis). Cache and return that from the backend.

B) Add a Django throttle to the view (can be done via IP / username).

C) Enable logged in users only to access endpoint (harder to do on the fly
though, since you need to make changes to your frontend). If a logged in user
is causing you hell, turn off signups and kick that user off.

D) Have CloudFlare cache a public response for you on endpoints and return it
(you need to make sure the JSON should always be the same for every API call
though, which is very very risky).

E) Author brought up DRF JSON serialization is slow. Another alternative is to
use Serpy which sees a 50-100x speedup. I'd only recommend that for complex
JSON payloads. Not because it's hard, but because it's additional complexity.

The author is also using Dokku which is fine for most projects, but you'd
imagine at some point it'll probably be switched onto a load balancer + web
machines. Alerting can also be set on the load balancer level if it goes above
% threshold.

Since he's using Dokku (so by that definition docker), they could probably use
a log aggregation service that would allow him to access his logs much faster
to see what's going on. Papertrail, etc.

Monitoring CPU usage would also be helpful here, but I'm not sure if Dokku
allows that.

------
emmanueloga_
I recently started reading the book "Release It", which includes a lot of
great techniques to avoid problems like the one described in the article at
"design time". [1]

1: [https://pragprog.com/book/mnee/release-
it](https://pragprog.com/book/mnee/release-it)

~~~
btrettel
Website doesn't work with JS turned off. It's even worse: It's one of those
sites that redirects you to another page to tell you to turn JS on but doesn't
allow you to go back in your browser. So even if you do turn on JS you have to
go get the link again. (Many scientific journals have the same problem but
with cookies instead of JS.)

------
RHSeeger
Why is the text for this article 2.5" wide in a normal browser? It makes it
annoying to read :(

~~~
sergiomattei
Thanks for the feedback! Will modify and enlarge the font a little bit.

~~~
jacobolus
The font size is fine. The width of the text block is absolutely ridiculous.

[http://webtypography.net/2.1.2](http://webtypography.net/2.1.2)

[https://practicaltypography.com/line-
length.html](https://practicaltypography.com/line-length.html)

~~~
jessaustin
I opened the print dialog, and discovered that this short piece would take 65
pages to print. Yes, that is a sign that the width of the text block is
absolutely ridiculous.

I couldn't figure out what's going on with my laptop, because the inspector
short-circuited this goofy behavior. On a larger screen, I notice that the
".card-content" div has a ridiculous 150px of padding. It is nested in a
".card.blog-content-card" div, which has an atrocious 200px of margin. That in
turn is nested in a ".blog-post-container.container" div, which has a merely
unseemly 93px margin.

After 886px is used for white space, there's not much screen left for text.
Might want to fix that?

------
pmlnr
Nginx rate limit the endpoint, log offenders, auto fail2ban.

------
Leo_Verto
The UFW rule not having worked for you may have been Docker's fault.

If your gateway/webserver is running in a Docker container and you've
published port 80/443, Docker will set up it's own IPTables rules, bypassing
anything you've set up using UFW.

------
jamesmawm
Good read. Some thoughts: \- Add throttling at nginx level \- Proactive
monitoring and alerts needed \- Should fool the attacker into querying a fake
endpoint

------
Gnunix
Hah. I remember the first time my server got DDoS'ed. I was scared shitless
and I was stressing so much. Glad everything turned out OK in the end.

------
aytekin
What doesn’t kill you makes you stronger.

~~~
jvln
My version of the saying is - what does not kill you cripples you.

~~~
jeffwass
“What doesn’t kill you makes you smaller” - Super Mario

------
csf333
get yourself a low orbit ion cannon & render all your enemies baseless

------
Zardoz84
this is ironic. I just having a DDOS from a botnet on China.

~~~
breakingcups
That's not ironic.

------
fxfan
Does anybody have experience in getting DDOS'd? All i see are 3 ip addresses
(offending) in the screenshot and it makes me wonder how many is typical?

I have never been ddos's and all I ever receive are failed ssh attempts with
simple passwords. Pretty much the easiest thing to tackle. But I'd love to
know from DDOS'd people how their attacks looked?

From cloudflare logs all I see is a single IP address being blocked (multiple
times? Or is it their multiple actions being blocked?).

EDIT: Thanks all, these were helpful answers.!

~~~
jeffwass
Anybody know why blocking the offending IP didn’t work?

~~~
Jeraz0l
Most likely because he was trying to block the IP at a point where it was
forwarded by some other service. At that point, the offending IP would only
exist in the x-http-forwarded-for http header and not as the source address of
the request.

~~~
sergiomattei
Yup! This was it.

