Top of HN traffic peaks at maybe 50-100 visitors per second. Unless you're serving a bunch of data, it really shouldn't be an issue. One problem I've run into is (I suspect) running out of sockets because they weren't getting recycled fast enough. This went away when I switched to Snap or Happstack. I used to have to "fix" this by killing and restarting my server every five minutes.
Do you see a lot of connections in TIME_WAIT? If so I'd just bump up the ephemeral port range via:
/proc/sys/net/ipv4/ip_local_port_range (or sysctl -w)
But if you're not getting recycling of TIME_WAIT, you can start to change the reuse/recycle attributes (caveat emptor) with:
net.ipv4.tcp_tw_reuse
or
net.ipv4.tcp_tw_recycle
I know I have been benchmarking a servant app by throwing 2048 concurrent connections at it and just bumping up the ephemeral range has been enough for my needs.
I tend to run out of socket FD's or just FD issues a lot quicker than ports.
Just a thought, but 50-100requests/second doesn't sound like too big of a deal. I did a quick bench of what i have setup and I get this:
Note, this is a static end point so take all these figures with blocks of salt. But by tuning what I said I can have wrk do 2048 concurrent connections reasonably with just a fair increase in overall latency. YMMV
tcp_tw_recycle is not what you want. It is only applicable to outgoing connections (i.e. not people connecting to your server), and even in that case provides little benefit over tcp_tw_reuse.
True enough, but its hard to diagnose what people are having problems with with such vague wording. Some apps might have to open connections to a backend, or not, etc...
But you're quite right on recycle for outgoing. To be honest I tend to shy away from adjusting either unless I really need to.
> Top of HN traffic peaks at maybe 50-100 visitors per second.
That's true, but HN posts tend to go viral on other tech news sites. It's likely the majority of the traffic spikes are not originating on this site's link. I could be wrong though.
The worst I've ever had was Reddit/HN/Hackaday. I wish I had taken pictures of the Google Analytics report; there was a huge spike when it hit Hackaday, like thousands of loads in a few minutes (maybe a bunch of RSS readers?), leveling off to a more reasonable amount and then falling off gradually as it was displaced from the top of the page. I think I usually get most of my immediate traffic from HN. Twitter is probably the most benign; it usually seems to cause a very slow burn, with little flare-ups over days or weeks.
Contact Amazon. I go through this every time I switch work servers to a new elb or spin up a new service. They are pretty responsive when it comes to this. Just let them know you are expecting X amount of traffic.
The new load balances may be a bit better in this regard Hough I haven't tried them yet.
You mention that you use a newsletter and weekly blog posts as part of your marketing strategy. Content marketing can be such a time suck. Have you found ways to automate any parts of these processes?
You might want to think about moving your blog to be a static site. Publish it with something like http://stout.is and it will be much cheaper, easier to maintain and will never go down.
I'm using Ember FastBoot to render the site on the server, so it's not quite static. For some reason it's throwing lots of errors and dropping requests right now, even though it didn't when I launched Indie Hackers on HN last month and had about 3x as many requests/second. Frustrating.
Anyway, I'm in the process of just moving it from Elastic Beanstalk to serve it from S3 via CloudFront directly. Waiting for the CloudFront distribution to spin up...
If you use a service like netlify, you can host a static site for free on a CDN. It's actually really convenient because it can build the site automatically whenever you push to a Github repo, and it can automatically setup HTTPS for you with a Let's Encrypt cert. It's a lot easier (and cheaper [free]) than setting up S3 and CloudFront yourself.
Github pages can do the same thing - it just has fewer features.
Good luck, though it looks like you're making good progress already. Last time I had to wait for CloudFront I sat there thinking "this would be a real pain if I actually needed it now..."
I hope the server gets back up quickly. Im still getting a no response error in Sydney. I'd love to read a post mortem about the issue sometime in the future.
Cached version here: https://webcache.googleusercontent.com/search?q=cache:F0QdH3...
Or if you refresh a few times, it should come up.