
Ask HN: How would you handle a large traffic spike (eg. being frontpage on HN)? - atarian
What strategies would you use to prepare or respond to a large spike of traffic?
======
patio11
This depends almost entirely on what you're doing. If the site at issue isn't
dynamic, my suggested action is probably "do nothing", because being
frontpaged by HN is not going to tax any computer routinely used to serve web
pages in 2017.

If you genuinely have problems, then the question boils down to "What is going
to break first?" and whether it makes more sense to harden that or temporarily
disable it. If you're a flight search engine and responding to flight searches
is just intrinsically costly, then you probably need more capacity and/or a
way to "shed load" and redirect folks into some sort of queue or alternative
UX ("We're overloaded; give us your email address and we'll get back to you.")

If the dynamism on your pages is something that is incidental to their
functionality, fakeable, or chosen simply for programmer convenience, you can
dial down the dynamism for the time being via e.g. sticking a cache in front
of the page, serving a static HTML version of it, pre-baking the default
search/etc rather than recomputing it live for each of the 40k sessions, etc.

The most common "HN killed my website" is probably WordPress being served by
Apache with KeepAlive on. That isn't simple to remediate if you continue
serving WordPress from Apache; this would be one of my fairly few cases where
it's a fundamental technology choice. (It is possible this has changed in
recent years, but for a period of several years "apt-get php apache2" would
get you an install guaranteed to blow up in production with over 10
simultaneous users.)

~~~
Velox
Absolutely nothing. I've had my personal site reach the top a couple of times
and its been totally fine each time. My site is static and runs on a $5
Digital Ocean droplet. The traffic barely even made it blink.

~~~
Velox
Just realized that my app responded to the top comment rather than the post
itself.

------
rubyfan
Caching.

I used to be responsible for sites involved with the Olympics. For 100 weeks
traffic was next to nothing. For the two weeks preceding and two weeks during
the games there really was spike after spike after spike.

The site basically used nginx with SSI to serve up static content that was
generated from Rails. Almost everything we could make static was. We rsync'd
files every minute or so across the cluster and would lazy load for content at
an individual node if needed.

For dynamic stuff we figured out tricks to use JS, APIs and memory cache
dynamic partials. I wouldn't recommend any of that unless you really have the
need though.

~~~
limeblack
Did you really use Rync? I figured there might be something more modern.

~~~
rubyfan
Yeah this was back in 2010-ish.

I explored a lot of distributed file systems, was really looking for something
that would share files across the cluster once generated by one node. NFS,
Ceph, GlusterFS, GridFS, some others.

------
alain94040
Turn on caching. That's it. Even at #1 on HN, you're looking at no more than
10-20K visitors in a day. It's not _that_ much traffic.

------
iEchoic
To prepare:

1\. Load test and fix what breaks. In the fastest and least-sophisticated way,
this is just taking typical requests and throwing increasing numbers of them
at your server until it breaks (or if you're testing in production, until user
experience begins to deteriorate). This is the most accurate way to identify
performance bottlenecks and also gives a rough estimation of your total
capacity. If you do nothing else, do this.

2\. Make sure your failure state isn't disastrous (for example, if our servers
go down, you're still presented with a somewhat-functional webapp, not a 503
error page).

3\. Make sure you are able to (and know how to) identify when systems are
failing due to traffic and quickly add capacity to any component of your
system. Ideally you have SMS/email alerts for this (they're really easy to set
up in AWS, for instance).

"Hugs of death" (at least at the HN or even large subreddit scale) are not
usually caused by lack of raw computing power, they're usually caused by
architectural/algorithmic flaws exposed by unusual request volume. Send that
traffic yourself ahead of time, and then fix those.

This is essentially how sites and services for large hardware launches are
scaled (such as console launches), just with more sophisticated methods. I
took this approach with Guilded
([http://www.guilded.gg](http://www.guilded.gg)) and the hug from hitting #2
on a million-person subreddit only reached about 15% of capacity.

------
iMerNibor
Caching! Cache responses where possible and have the webserver serve it
instead of going to a dynamic backend (php, node, ruby, python, ...).

Not worked with apache for a while (it is probably sufficiently configurable
too), but nginx is quite resilient and makes "basic" caching very easy
(proxy_cache and fastcgi_cache should cover you)

------
twobyfour
If our current site got a 10x spike in traffic with little to no warning, the
only thing we could do to prevent it from keeling over would be to raise the
cache timeout on our caching proxy.

Why?

Well, we could easily add more application servers as needed. That would take
about 15 minutes, and we'd probably only need to increase the count by about
50%, as we've got plenty of spare capacity.

The real problem is our database server. Or rather, the way our CMS uses the
database. Any sort of traffic increase to the CMS hammers the DB to the point
of deadlocks (yes, on reads) for other portions of the site. We have some
plans to improve the situation (including changing some stupid DB config
decisions made a decade ago), but nothing that can be implemented short term.

Thankfully, the CMS content is fairly static, and our 1-min caching isn't very
aggressive. Increasing the cache timeout to 10-15 min would result in far less
backend traffic and get us through most traffic spikes. The rest of our site
either is available only to paying users (and thus far less likely to be
significantly affected by a traffic spike) or served primarily by other data
stores with much more room for capacity growth in their present configuration.

------
remx
Make everything static. Unless you really need some sort of dynamic content.
If you do need dynamic content, make sure to stress test it. There are tools
out there to load test your website to see if it breaks.

Put it on Cloudflare. Cloudflare can absorb huge volumes of traffic with ease.
Keep in mind there are other WAFs (Web Application Firewalls) you can check
out.

Use as little third party widgets / bells and whistles as possible, and self
host assets when you can. If these go down (which they will when your site is
trending on Hackernews), then your site may not load correctly and leave your
users frustrated. Remember the recent S3 failure? It broke thousands upon
thousands of sites.

~~~
jgrahamc
Also, if your site is dynamic but doesn't really need to update on every load
you can set up Cloudflare to cache for a short period (e.g. 5 minutes).

------
dmitrygr
If you don't make your website a megabyte-sized monster, it isn't a problem.
Serving simple html/css content is possible to more users than hn has at any
given point in time from a single-core Pentium 2. Anything more than that is
just bloat.

------
bgammon
Prepare ahead of time. Short of using an elastic load balancing service,
predict when you may have traffic spikes, and choose a cheap-to-implement
solution such as request-level load balancing.

Create replicas of your web server processes on different machines for the
duration you expect a potential traffic spike. Use a fast dispatcher like
nginx[0] as a reverse proxy to load-balance requests to the appropriate
replica web server machine.

If you see consistently low traffic, spin down the replicas and remove them
from your load balancer configuration.

[0]
[http://nginx.org/en/docs/http/load_balancing.html](http://nginx.org/en/docs/http/load_balancing.html)

------
BjoernKW
Last time (actually, the first time ...) a blog post on my website went to the
front page of Hacker News WordPress with proper caching enabled (using WP
Super Cache with pretty much the default settings) worked just fine.

------
bigiain
A one-off (or hoped-for) event? I'd just stick CloudFront in front of it...

If I were hoping to build a high traffic site (as in - I expected long-term
heavy traffic, rather than a single "spike") - I'd work out how to most easily
implement my cms's caching options with CloudFront or S3 or some other CDN.

Most important thing is to make sure a page view doesn't _really_ require a
bunch of db hits or personalisation. (Especially not if you're using something
like SiteCore or are running super lean with WordPress on inexpensive shared
hosting...)

------
tedmiston
I've had several posts on the front page running on small boxes without issue.
At peak I saw 150-200 simultaneous users in Google Analytics. Sitting at the
top of the front page for several days would probably see more than that.
Putting a Cloudflare caching layer in front wouldn't hurt, but I didn't really
need to do anything to prepare.

Alternatively just publish on a hosted blog service like WordPress, Medium,
etc.

(This assumes your content is a post and not an app.)

------
LinuxBender
Static content in a ram disk and haproxy+apache replicas grown as required. No
CDN.

Even the most "dynamic" sites in terms on non user-specific content can be
turned into a static snapshot via a simple cron job. That means the dynamic
content is only ever hit by one person, that being cron.

Real user-specific dynamic content must require a login and cookies for
haproxy to even let the connection pass the first stage of validation.

------
nkkollaw
Happened to me with an article of mine.

It got to be 3rd at the most, if I remember correctly.

I got 300 people at the same time for about 10 hours, then high but not crazy
traffic for 2-3 days after that.

300 people at the same time is not a lot.

I had a WordPress website with no cache on a dedicated VPS (but nothing
crazy), and neither the CPU nor memory got that busy. 300 people are about 1
person/second, 2 max. That's not a lot of load for a server.

------
cdevs
My company did some email marketing that would spike traffic to 50,000
visitors in 30-60 mins and Wordpress MySQL would go nuts. Who knows how many
horrible plugins marketing had in there but first go to for me was varnish,
then cloud flare then final awesome fix was swapping everything to static HTML
that we crawled every night ourselves from our own htaccess blocked Wordpress.
Only thing that had to be dynamic /do related was the contact form.

~~~
paulcole
>Who knows how many horrible plugins marketing had in there

Why aren't you proactively working with "marketing" to figure out why they
installed the plugin and helping them reach their goal in another way?

~~~
raarts
I've seen many marketing departments go around IT because they perceived it as
too rigid, slow and inflexible.

------
hoodoof
Is HN front page a large spike in traffic?

~~~
patio11
No.

For the benefit of other folks doing capacity planning: expect peak load of
~25 requests per second [+] in the first few minutes after your thing hits the
front page and 50k~80k sessions over the course of a day for a typical
article-style UX, coming both from HN and Twitter/etc assisted magnification
of HN articles.

[+] For your HTML; you'll of course get the usual multiplication for linked
scripts/css/images but you'll come nowhere close to saturating your uplink
with those, so the limiting factor is almost certainly processing time for
dynamic requests.

------
Shorel
My planned solution: Use Jekyll instead of WordPress.

I have still not finished that migration.

~~~
lois
Surely there are Wordpress plugins that handle caching/static page serving? Or
is it insufficient?

Otherwise did you consider Grav[0] or Zotonic[1]

0\. [https://getgrav.org/](https://getgrav.org/)

1\. [http://zotonic.com](http://zotonic.com)

