Hacker News new | past | comments | ask | show | jobs | submit login
My first DDoS attack for a $200 ransom (ghirardotti.fr)
96 points by LaurentGh on May 5, 2016 | hide | past | favorite | 66 comments

Roughly, a somewhat lackluster response to a somewhat lackluster DDoS attempt.

They tried blocking specific ip addresses, which didn't work, because the attack was somewhat distributed. They then just turned on some caching, which allowed the site to function, albeit with an unknown excess bandwidth charge pending.

And, the DDoS itself can't of been terribly impressive, as all it took to mitigate was a bit of caching. He mentions 10 requests / sec as the scale of the attack.

Thinking on this some more, this story makes even less sense.

He first mentions having to change Apache to recognize X-Forwarded-For, because there is Amazon Elastic Load Balancing between his site and the internet.

This means, of course, that the "attacking ips" aren't making direct connections to his EC2 instance. They are proxied connections, all from the internal ELB service.

So later, when he mentions trying to use iptables to block traffic...that just doesn't make sense. There are no connections from those ips to the EC2 instance. You could use .htaccess rules, since Apache is aware of X-Forwarded-For.

Lastly...why would you put an elastic load balancer in front of a single web server?

You do this by telling iptables to look at the X-Forwarded-For header when deciding what IP that request is coming from.

This blog posts explains the whole thing: https://centos.tips/fail2ban-behind-a-proxyload-balancer/

I have no idea if using .htaccess rules would be better than this solution, I just know that this one works.

The article shows the run of the mill iptables syntax being used, no packet inspection...

And, it's possible this is https, which would render the packet inspection useless.

There's a few cases; some have mentioned scenarios where you might want SSL offload. Those are perfectly valid. I'll contribute another.

Let's say you have a single web server you need to have up all of the time. You need high availability, but not necessarily instant fail-over, because you want to keep your costs low, and so you don't want two instances running all of the time. It may not serve much traffic at all, so there isn't much load to spread. What you can do is place an ELB in front of the web server, set a condition to start a new instance if the page becomes non-responsive (i.e., a failure), and set the auto-scaling to "min 1, max 1." This way, you'll always have a pool of one server, that will automatically rebuild if the instance fails.

I admit, it's not a common use case, but it's one of the more clever uses of the ELB I've heard. =)

We do this. We have 50+ ASGs for the same reason. Often it's min=3, max=3 for HA, but if a machine fails it's automatically replaced.

Every service in our infrastructure runs the same way. Some have scaling policies, some don't.

DBs like Cassandra are also in ASGs since if one of them terminates and new one can come up and bootstrap without operator intervention.

I think you're absolutely right for iptables, as I didn't changed anything to use the X-Forwarded-For IP, so this part might have been fully useless. About the EC2, it's because it's managing the SSL for us, and we used to have two servers behind it.

Everything could have been planed way better (cached, written with a fancy language...), we could have had 10 mil requests/µsec... the main idea was just to get tell how we tried to manage the situation, with the website and skill we have.

I also think my testimony is nearer to what most of web dev can be confronted to, in contrary to one Cloudflare/Gihub BS press release written by 10 experts to increase valuation :p

You might use ELB + one EC2 to serve an SSL certificate. That takes the encryption load off of EC2 and is durable. AWS has a new SSL service though, but this was a recommended way until recently.

Exactly our situation :)

I can't answer for LaurentGh but if I well remember is a temporary situation (for few months).

I was shocked that 12 requests/second could take down any site.

I use async logic (previously OpenResty, more recently NodeJS and Go) and largely pregenerated sites, so 2500 requests/second is a minimum baseline -- on a much lower end instance than an m4.xlarge.

There's a reason I don't use PHP (or any primarily synchronous language like Ruby) any more.

The language used is meaningless absent the context of the whole application, especially the database. I can do 2500 requests a second with ruby or php on a micro instance on AWS. But that's meaningless after I plug into a mysql database that's going to bottleneck at 2 requests a second after i try to render abad wordpress theme out of MySql.

That 2500 is after taking into account relevant database queries. With async servers (which, as far as I know, PHP can't be, at least with the way it typically integrates into Apache) you can accept all the requests, forward off the database requests each depends on, and still send back the pages to everyone who requested one.

Of course I've avoided WordPress and similar frameworks for years for a reason as well; I've seen front-end frameworks that require dozens if not hundreds of database queries to render a page, which is such bad design that it makes my brain hurt.

node.js does have it's advantages, but it's possible to be very fast with php.

For example, the advantage of the "warm app" with node is often approximated by using a shared memory kv store in php.

And, while an async io approach scales in a simpler way, you can typically find an optimum tuning for fastcgi that scales very well.

Looking at this benchmark: https://www.techempower.com/benchmarks/#section=data-r12&hw=...

There are php-fpm implementations running at the same clip as node, and an hhvm implementation trouncing it.

Yes, benchmarks are sometimes bullshit, but the idea that node's approach is somehow light years ahead just doesn't pan out in the real world.

>benchmarks are sometimes bullshit, but the idea that node's approach is somehow light years ahead just doesn't pan out in the real world.

It's in the real world where you have database connections that can't always scale to 186k/second with low latency on all responses, which is what's required to get the performance out of HHVM in the linked benchmark. In a typical architecture you may not have the database local to the PHP server, meaning latency will be much higher. And it's upstream latency that kills a synchronous connection.

And it's in the real world where a number of third party queries may be involved with a request, and those queries may take 100ms to resolve, during which time your thread is blocked in PHP whereas Node can be busy handling other requests or even other aspects of the same request (a Promise.all of a half dozen simultaneous database queries, for instance).

Async approaches are light years ahead in the real world. I keep seeing references to HHVM when people try to defend PHP, so I did a Google search to find out if it supports async, and the answer is "a little bit." [1] Basically it looks like, within a single request, it can execute several queries in parallel, like my Promise.all example above. So it looks like Hack, at least, has that feature. But as far as I can tell it doesn't mix multiple connections in a single thread. And you have to be careful to use only async-aware operations or you lose the benefits; Node is designed around async behavior, so everything you're likely to use supports it by default.

Those benchmarks are in fact unrealistic. And the Node implementation in that benchmark uses a single connection to MySQL, so the 20 queries are actually executed in series instead of in parallel. [0] If they used a connection pool instead, they could all execute in parallel. Look at the numbers:

Queries 1 5 10 15 20 nodejs 85,490 22,917 12,083 8,250 6,254 hhvm 12,369 12,428 10,056 10,394 9,322

The NodeJS results shouldn't drop that fast off of the single query unless they're all using the same MySQL connection; even using just 4 connections from a pool should speed it up by a lot. Also, the HHVM source uses stored procedures while the Node version recompiles the query every time. Finally, Node can be faster when driven from Nginx, while it's using the internal Node server instead.

Here's an article I just stumbled across that talks about the limits of HHVM to accelerate PHP: [2] It touches on some of the same points. Async is the important way to speed up real world apps, and it's not the default "PHP Way."

Aside from that, probably 90% of the people using PHP are using it "normal" PHP on a hosted server in Apache and not HHVM at all. So you're basically arguing hypotheticals: "IF they use HHVM, and IF they write the code exactly right, and IF their upstream servers have really low latency all the time, then PHP isn't much slower than Node."

Node is usually faster in the real world given common coding patterns. And Node gives you Socket.IO, which pretty much kills the relevance of PHP no matter how you slice it. Even long polling would slaughter a PHP server; you'd be able to support at most one concurrent user per thread. Async servers are a good at supporting long running connections.

[0] https://github.com/TechEmpower/FrameworkBenchmarks/blob/mast...

[1] https://docs.hhvm.com/hack/async/introduction

[2] http://www.infoworld.com/article/2948132/app-servers/hhvm-38...

From what I remember of symfony (it's been years) their orm had some nasty memory leaks.

If the request is performing heavy calculations you will see fever req/sec obviously.

Say an API call spins up a Linux VM and makes it available some user. Or a bulk upload of data which needs to be indexed. Or whatever.

The idea that a site should be able to handle X requests/sec because the stack can handle X NOOPs per second is odd.

Not talking NOOPs. I'm talking multiple full round trips to the database per query.

Node can render more NOOPs per second than that; I've heard of a well tuned Node server hitting 100k. But because of the async nature of the handling of responses, you don't need dozens or hundreds of threads to handle thousands of clients, and it's the threads that kill you.

Unless you've just got an awful architecture, in which case that will kill you first.

I still don't understand what point you are trying to make. Surely you must agree that the number of requests per second your server will depends on what the server is doing? If your operation calls a DB or calculates something and that is heavy, then your db or internal calculation service might run out of resources. I'm not sorry, but your lack of experience with large scale systems is visible.

Most web sites are just grabbing data from databases. If you use an async front-end, it's the database servers that need to scale, because they're doing the heavy lifting. But scaling your database infrastructure can be orthogonal to scaling the application layer; in Node, except for under extreme load, you may never NEED to scale the application layer. And for really high load situations, you may need 25-50x fewer Node servers than PHP.

It's the poor design of most systems that cause them to not scale; there are certainly exceptions where the server needs to do a lot of complex calculations, but those calculations can most of the time be handled in microservices and, again, scaled orthogonally to the application layer.

PornHub runs on a PHP stack. Do you think that maybe they get a little more than 12 requests a second?

I wouldn't blame that on PHP. You can make PHP sites with reasonable performance.

A well configured small / medium instance should easily handle 100 requests/second. My test PHP setup on micro instances serves 1000 requests/second before any signs of slowdown. [1]

[1] https://nestify.io/wp-content/uploads/2015/10/loader.io_.png

Every application is different and has different requirements. I know absolutely knowing about the original author's application so I can't comment on any specifics. But just because your "test PHP setup" can handle X amount of r/s does not suggest that any other application should be able to do the same.

A better test would be to disable any cache plugins in your wordpress and then run tests against your site.

Put a database behind it with complicated enough requests, and you'll run out of connections to the db regardless if you have async io. You'll have to add a connection pooler or multiple master/slave dbs - and even then you will run into problems with enough traffic. Not everything is sending back some json from something that is readily available.

To really scale a complex architecture, certainly there are other things you need to do, like setting up database replication and/or sharding, caching, and other scaling strategies.

That can typically be done from Node by simply adding the "use pooling" option in your database library (or sometimes by switching to the NPM for that database that enables pooling), and when you have additional slave databases, adding those to the database init call.

As far as Node is concerned, it really is that easy. Scaling PHP, though, pretty much requires that you add more threads, which means (after a point) you need a lot of RAM, or just adding more instances and load balancing between them.

Node won't prevent database scaling issues, but it will keep the part that it does handle a lot easier to maintain.

But can you do pregenerated sites all the time? Like in my case I have a search page, and some dynamic things, I should generate a huge amount of pages (but while writing this I'm thinking that it could be feasible though a bit complicated to regenerate in numerous situations)

I bet you I can write a website in NodeJS or Go that will fail with fewer than 1 request / second. Heck, I bet you I can make a website that will fail even if it only receives one request in its lifetime!

The language usually isn't the reason for issues like this...


For tasks that are primarily IO bound, async architectures can scale more than synchronous languages. Period.

It would take intentionally (or newb/cluelessly) bad design to end up with a Node server that DOSed at 12 connections per second.

In PHP, if you run 4 threads, it just takes a backend with a 333ms latency (on all queries performed serially) to limit you to 12 connections per second. If you only run one thread, you just need a backend with a cumulative 83ms latency to get DOSed at 12 connections per second. In a more realistic scenario, a typical crappy framework will result in dozens of queries for a single page, but it comes down to the same thing.

In Node, you can run one thread with a cumulative 333ms backend latency and still handle thousands of connections per second. They'll each just end up waiting a bit more than 333ms for their results, assuming the database itself isn't DOSed (which takes a surprisingly high load -- way above the levels we're talking). Actually, depending on how interconnected the backend queries are, Node may actually result in less than a total 333ms latency, because many of those queries may also be parallelized by the browser, and will then be handled in parallel by the server (and much of the latency may actually be in http negotiations and/or establishing a database connection, honestly).

>>It would take intentionally (or newb/cluelessly) bad design to end up with a Node server that DOSed at 12 connections per second.

Single threaded makes for it's own pitfalls. I assume you can imagine some cpu bound tasks that would have node.js at 12 connections/sec or less.

>I assume you can imagine some cpu bound tasks that would have node.js at 12 connections/sec or less.

Certainly. That would fall under "newb/clueless" design, though. Anyone who would throw a CPU-bound task into a primary Node server shouldn't be allowed near architectural design decisions. Whereas code in PHP written using best practices can easily end up with a server that can barely hit 100 queries per second.

Imagine, for instance, a situation where the client needs to do 50 requests to the server to render a page [1][2], and each query ends up with 20ms of latency on the PHP side; assuming you're running 8 threads (and the client makes 8 concurrent requests), a single page query could block your server for 125ms. A slow client or network might even block your PHP threads for longer. Node could crunch through ten thousand requests like that per second when running on four CPUs, meaning 125 of these bloated pages rendered per second, compared to ... 8 or less.

Even with decent client pages that can render with ONE server query, a couple dozen database lookups are par for the course, sometimes including an authentication lookup on a third-party OAUTH server. That could be 125ms all by itself, and in PHP your thread is blocked while the lookup happens. With the async model, once the query is off, the server is doing work on other requests until the query has returned data.

Many CPU-bound tasks like "convert an image" are already coded in Node to happen in the background, triggering a callback when they're done so you can then send the result to the client. And in Node it's absolutely trivial to offload any likely CPU-bound task to a microservice, where the NodeJS server just queries the microservice and waits for the result. Which you'd want to do, of course, if a task is CPU-bound, because you would want a faster server than V8 running it anyway. Go would be a likely candidate, and Go handles threading either through light/async threads or via actual threading, as necessary. It's quite awesome.

And if you really can't trust your developer to write code without extensive time-consuming calculations, then make them use Elixir or Erlang. It will use preemptive multitasking at the VM level if a thread takes up too much time, and even if they foolishly write a task that takes hundreds of milliseconds to complete, it will still task swap and serve other clients.

But arguing that pathologically bad code in Node can make it perform as badly as PHP does all the time isn't exactly a ringing endorsement for the language.

[1] In 2014 the average number of objects a web page requested was 112, and seemed to continue to be going up, though I'm assuming a lot of those are static resources and third party requests, like for analytics and ads. http://www.websiteoptimization.com/speed/tweak/average-web-p... I've personally seen pages with 70-80 requests against a PHP backend to render one page.

[2] And I wouldn't call a client page needing 50 requests a best practice, but I'm assuming that we're talking about the server side here, and that we are being forced to deal with an existing client that behaves that way. So call it "best practices on the server."

The webpage[0] seems to be having issues. The best I could do was the Google cache[1] or the Markdown source[2].

[0]: http://lologhi.github.io/symfony2/2016/04/04/DDoS-attack-for...

[1]: https://webcache.googleusercontent.com/search?q=cache:J7lca_...

[2]: https://github.com/lologhi/lologhi.github.com/blob/master/_p...

This is an amazingly weak DDoS, put your site behind CloudFlare or similar free service and go take a nap. They'll tank this without raising an eyebrow.

probably because I've been playing an mmo, but i like the use of the word 'tank' here

The etymology here is actually really interesting. The term "tank" was an Americanism for a swimming pool once upon a time, and the etymology of that is pretty well-known: it was imported from a Portuguese word meaning "pond" which ultimately came from India [1]. The same source quotes a 1960s usage of the term for "failure" in the sport of Tennis.

The term actually seems to come from a 1920s boxing euphemism [2]: when a boxer is not actually knocked out but voluntarily lays down on the ground, it was called "a dive" for obvious reasons; euphemistically some people called this "going into the tank," since you'd dive into a pool.

How did this start referring to the vehicle? Again, back to [1], there was once a memo "recommending the proposed "caterpillar machine-gun destroyer" machines be entrusted to an organization "which, for secrecy, shall be called the 'Tank Supply Committee,' ..." and the rest is history.

[1] http://www.etymonline.com/index.php?term=tank

[2] http://www.slate.com/articles/news_and_politics/explainer/20...

Yep, true, it's planned. But sometimes their captcha page tend to block some legitimate trafic...

It's not that impressive because we read everyday articles about crazy DDoS big companies are able to mitigate. But when it's the website your responsible for, whatever the number of requests/sec, you just need to find way to manage it, and CloudFlare can have some weird side effects.

> 40 cores [m4.10xlarge], but still unable to process 10 requests/sec

my goodness.

That's php for you. Although I use php myself quite often, it can be a resource hog if you're lazy about optimization. A customer I was working with was using wordpress, and their homepage took about 5 seconds to load due to a hideously inefficient wordpress module that was doing the exact same sql query thousands of times! With a little bit of optimization I managed to get it down to about 1 or 2 seconds.

For my own sites, I mostly use static html or server-parsed html.

The overhead of an SQL query has nothing to do with the language you're using.

Of course I'm aware of that. I was just giving an example of a specific case I came across. The point I was making is that there is a lot of inefficient php code out there.

Recently a colleague and I took a PHP+MySQL page that was taking 1-2 minutes to load and were able to drop it down to 30 seconds after our first round of revisions. After a second round of revisions, we got it under 15 seconds, then after another round we got it to the 1-5 second range. Most of the optimizations were in the PHP not in the SQL.

I came to make the same point - but PHP itself is not directly the problem.

There are many, very popular Wordpress plugins that take hundreds of SQL queries just to render a landing page. 10 hits/sec legitimately is "DDoS" territory for many businesses running such things.

Ummmm.... A cache layer for any web application is a must have, perhaps he could have avoided the attack all along if it were present on the system since day one?...

At least for this kind of attack, a more serious DDoS won't be tamed by "just adding cache"

Well, when your DDOS is '10 requests a second', your site is probably not the most sophisticated.

Haha, true! But please share your personal DDoS experience is mine is not large enough :p

Oh goodness, I would, but I would need to talk to my employer. I work for a CDN, so our DDOS experience is very different :)

I also hope I didn't sound too insulting. I know keeping any site up against DDOS is hard!

No it's ok! Thank you ;-)

For next time you don't want to have to copy and paste. No need for SED.

cat <file> | cut -d ' ' -f1 | sort | uniq -c | sort -nr

I'd suggest replacing the use of cat with:

    tail -n 10000 apache.log

no need for cat

You're correct, force of habit.

Apparently, it didn't work. :)

Site not installed The site ghirardotti.fr is not yet installed

[Edit: it's up now.]

Though, it is interesting how an article with a dead link made it on the frontpage with 3 points.

It's working well for me. And as it's a GitHub website, it should be ok...

And there are 73 persons on it right now if I believe Google Analytics

[ Edit: I'm almost certain it's an ipv6 vs ipv4 issue. the ipv4 addresses resolve to github pages land, the ipv6 address resolves to somewhere inside OVH - the issue being that if the viewer's network infrastructure prefers ipv6, they will get a holding page from OVH stating that that "Site not installed / The site ghirardotti.fr is not yet installed" ]


  dig ghirardotti.fr
  ;; ->>HEADER<<- opcode: QUERY, rcode: NOERROR, id: 19860
  ;; flags: qr rd ra ; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0 
  ;; ghirardotti.fr.	IN	A

  ghirardotti.fr.	3112	IN	A
  ghirardotti.fr.	3112	IN	A

Both ip's belong to Github, probably used as a failover/load balancing/round robin pair of some kind.

reverse dns

dig -x

  ; <<>> DiG 9.8.3-P1 <<>> -x
  ;; global options: +cmd
  ;; Got answer:
  ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 46826
  ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

  ;	IN	PTR

  ;; ANSWER SECTION: 3593 IN	PTR	pages.github.com.

dig ipv6:

  dig AAAA ghirardotti.fr

  ; <<>> DiG 9.8.3-P1 <<>> AAAA ghirardotti.fr
  ;; global options: +cmd
  ;; Got answer:
  ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 0
  ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

  ;ghirardotti.fr.			IN	AAAA

  ghirardotti.fr.		1613	IN	AAAA	2001:41d0:1:1b00:213:186:33:19

  ;; Query time: 91 msec
Reverse dns on that:

  dig -x 2001:41d0:1:1b00:213:186:33:19

  ; <<>> DiG 9.8.3-P1 <<>> -x 2001:41d0:1:1b00:213:186:33:19
  ;; global options: +cmd
  ;; Got answer:
  ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 25262
  ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0


  85728 IN PTR cluster010.ovh.net.

If they're using OVH they already have free DDoS protection. I use OVH myself and I've had a few mitigated DDoS attacks that I barely noticed.

Perhaps 10 requests/sec is below OVH's detection threshold. My last one was 8.4Gbps @ 1 million packets per sec. 10 requests/sec would be difficult to even notice :)

Nope, OVH is just for our domain name. Also is talking about my blog, not https://www.villa-bali.com website, which is on AWS.

Thank you for this investigation! I've learned how to use dig! I've just remove the wrong AAAA record, as there is no IPv6 for Github Pages.

Well, typical SLA for server side is 500 ms, then you have a chance to load a whole page under 3 seconds, which is recommended by google usability findings.

villa-bali is not even close to this, my bet that you (or your ORM) are making too many requests to database. Try to record ALL requests to database during page rendering and I bet you have about hundred. Check out following test results:

8 test agents: http://loadme.socialtalents.com/Result/ViewById/57341f645b5f... - 5% of users have to wait more than 2 seconds 16 test agents: http://loadme.socialtalents.com/Result/ViewById/57341f1a5b5f... 5% of users need to wait for more than 4 seconds.

Definitely, any bot can nuke your website easily​.

How come the original post has 55 upvotes, but the karma of of original poster is only 18 (6:33 PM GMT)?

Users receive one karma for one comment upvote, but one submission upvote yields less than one karma for the user.

I wonder what would happen if GET / only returned a redirect to somewhere (either an HTTP code or an HTML with window.location='http:/yoursite.com/new_page'

> 40 cores, but still unable to process 10 requests/sec

Stopped reading after that.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact