Hacker News new | past | comments | ask | show | jobs | submit login

I work in the entertainment / ticketing industry and we've been burned badly before by relying on AWS' Elastic Load Balancer due to sudden & unexpected traffic spikes.

From the article: "Elastic Load Balancer (ELB): [...] It scales without your doing anything. If it sees additional traffic it scales behind the scenes both horizontally and vertically. You don’t have to manage it. As your applications scales so is the ELB."

From Amazon's ELB documentation: "Pre-Warming the Load Balancer: [...] In certain scenarios, such as when flash traffic is expected [...] we recommend that you contact us to have your load balancer "pre-warmed". We will then configure the load balancer to have the appropriate level of capacity based on the traffic that you expect. We will need to know the start and end dates of your tests or expected flash traffic, the expected request rate per second and the total size of the typical request/response that you will be testing."

You'd be surprised about how many people don't know this. I had an expectation to scale past 1B users. I was trialling AWS when I realised through testing that it was this way. It could not deal with sudden spikes of traffic.

Suffice to say, I went elsewhere.

A billion users? Are you Facebook or the Olympics?

Neither. But once you start doing something like serving ads. The paradigm shifts. Of course, what I do is a lot more intensive/complex. But I'll say this to get the basics across.

It doesn't take facebook. I'm in a small adtech company. Tens of billions of requests a month is not unexpected.

> Tens of billions of requests a month is not unexpected.

10000000000 / (60 * 60 * 24 * 30) = 3,858 req/sec. That's a pretty good clip.

That's a small adtech company. The larger ones do that per day with some over 50B/daily.

Yep. I spent some time working for one of the largest.

We see 10,000 req/sec on a regular basis.

It's not always _users_, but requests. As companies embrace microservices, I think you'll see a moderately sized application pushing tons of requests over HTTP that would normally have used a different protocol

Where did you go, if you don't mind expanding?

I don't mind. I went with dedicated hosting. I found a supplier which had their own scalable infrastructure. They already had clients which had ad server type applications that scaled into the Billions and could handle traffic spikes. With that type of setup, it was a no brainer.

I'm a sysadmin with over 10 years with Linux. So for me to setup and support servers is pretty trivial.

The agreement I had with the supplier. They managed the network and hardware 24/7. I managed the setup and support of the servers from the OS up. This arrangement worked well and I had zero downtime.

> I went with dedicated hosting

This doesn't get mentioned as much as it should but there are VPS/dedicated providers who are very close to AWS DCs.

Enough so that for many use cases you should have your database in AWS and your app servers on dedictated hardware. Best of both worlds.

Can you share a list of providers that are close to AWS DCs?

Pretty much any data center in Virginia will be close to US-EAST. If you contact them for setting up direct connect pipes they'll also provide you with a list of locations to check out.

You'll have to compare regions depending on providers. Softlayer has pretty good coverage with matching regions and low latency.

  I don't mind. I went with self hosting. I found a supplier 
  which had their own scalable infrastructure.
That's a little vague. By "self-hosting" you mean Linux VMs, like EC2, right, or something more abstracted than that? What supplier?

Sorry, I just updated the post. I meant dedicated hosting. So bare-metal machines.

If you want to know the supplier. They are called Mojohost.


When you need performance, bare metal is always the way to go.

This saying holds such little value for so many engineers. They want uptime, ease of management, and security.

Most people aren't worried about squeezing another 3% performance out of thei servers. In fact I would say the slice-and-dice nature of VMs allows for better overall capacity usage because of over provisioning of resources. How many apps do you know that hover at 0.07 load all day long?

Okay, how's this:

"If you're willing to pay up to a 40% premium for the features cloud providers provide, pay them. If not, go bare metal."

Fair enough.

All they say is it costs 125$. 125$ for what ? They do not mention the specs of the hardware in their website.

If you hadn't been a sysadmin, would still have chosen dedicated hosting? (Given that you have serious scaling requirements, of course). In other words: Would it be realistic to say that a service like Elastic Beanstalk saves on hiring a sysadmin?

Sysadmins / operations people should be able to handle anything below an OS better than your usual devops guys that would be able to build you a variation of EBS and their value further depends upon if your software has special needs that are not suitable for cloud / virtualized infrastructure.

I've heard of many start-up companies save plenty of money using dedicated hosting even without any operations / sysadmin pros around scaling to millions of users when the equivalent in AWS with relatively anemic nodes fared much better. In fact, WhatsApp only had a handful of physical servers handling billions of real users and associated internal messaging and they had developers as the on-call operations engineers.

I'm an ops engineer / developer and I'd use dedicated hosting if success depends a lot upon infrastructure costs. For example, if I started a competitor to Heroku at the same time they did, I'd definitely be having a very careful debate between dedicated / colo hosting and using a cloud provider tied intimately with my growth plans. Many companies have shockingly bad operations practices but achieve decent availability (and more importantly for most situations, profitability) just fine, so even the often-cited expectations of better networks and availability zones may be worth the risks of not caring that much.

We went to Softlayer with their smallest instances running Nginx to load balance everything. Much faster and cheaper.

Why in the world would you assume any off-the-shelf solution would serve a billion users?

Unlike many cloud providers AWS can be setup to serve a billion requests but you need to think that mess out from start to end. You can't setup an elb, turn on auto scale and then go out to lunch.

Why not? That's exactly the use case, if you dont need to prewarm for bursty loads. It'll just be extremely expensive.

Also, as another comment here says, I believe a billion "users" is more like "requests" as users is vague and undefined. A single person could launch 1 or 100 requests depending on the app.

What other vendor did you go with and now looking back was it worth it from a cost & operational perspective?

Why not work with AWS to mitigate such risks now that you know more about ELBs?

This might be of interest, Netflix pre-scales based on anticipated demand: http://techblog.netflix.com/2013/11/scryer-netflixs-predicti...

After testing ELB and seeing the scaling issues, we ended up going to a pool of HAProxies + weighted Route53 entries. Route53 does a moderately good job of balancing between the HAProxies, and the health checks will remove an HAProxy if it goes bad. HAProxy itself is rock solid. The first bottleneck we came across was HAProxy bandwidth, so make sure the instance type you select has enough for how much bandwidth you expect to use.

Do health checks work within a VPC? My understanding was they don't, so this only works for externally facing services.

I agree Haproxy is solid, but ELBs are wonderful for internal microservices.

If you do decide to use Haproxy for microservices internally, I highly recommend Synapse from AirBnB: https://github.com/airbnb/synapse

Ruby, High Availablity and High Scalability? Despite idempotency, I'm not sure how comfortable I am with that.

Synapse is a service discovery framework. Essentially, it just writes HAProxy config files based on discovered upstreams - it does not receive any requests itself. The scalability is handled by HAProxy.

I was under the impression that HAProxy is what it is powering Amazon's ELB service.

I wish Amazon would switch to a 'provisioned throughput' model for ELB like they have for DynamoDB, where you say what level of throughput you want to support and you're billed on that rather than actual traffic. Then they keep sufficient capacity available to support that service level.

So if you expect flash traffic, you just bump up your provisioned throughput. Simple and transparent.

You can contact AWS support if needed, and they'll warm up the ELB ahead of time.



It's not perfect, but works in a pinch.

That would be a very cool offering.

Another gotcha is that ELB appears to load balance based on the IP addresses of the requests... We had private VPC/IP workers talking hundreds of requests per second to a non-sticky-session, public ELB fronted service (... don't ask why ...) and experienced really strange performance problems. Latency. Errors. What? Deployed a second private ELB fronting the same service and pointed the workers at it. No more latency. No more errors.

The issue appeared to have been that the private IP workers all would transit the NAT box to get to the public service and the ELB seemed to act strangely when 99.99% of the traffic was coming from one IP address. The private ELB saw requests from each of the individual IP addresses of the workers and acted a lot better. Or something.

Elbs are one of the known biggest weaknesses of aws...

Their whole position on them is super opaque and prewarming is still an issue.

I'll write more about this later, but so many people have had outages due to aws' inability to properly size these things.

I went to a meetup about 2 years ago and one of the engineers from CloudMine gave a talk about load balancing on AWS. CloudMine ended up dumping ELB for HAProxy to handle their scaling needs.

how does HAProxy compare to OpsWorks? the HAProxy wikipedia page mentions OpsWorks is based on it

Nginx running on a tiny instance can load balance 100k connections at thousands of requests per second. The network bandwidth for the instance will probably be saturated way before the CPU/RAM becomes a problem.

ELB (and most other managed service load balancers) are overpriced and not great at what they do. The advantage with them is easier setup and lack of maintenance.

If you're running a service with hundreds of millions or billions of requests, it's just far more effective in every way to use some small load balancing instances instead. Their Route53 service makes the DNS part easy enough with health checks.

Why do you say they're overpriced? I would say for most apps their downright cheap. Especially since you spend so little time tinkering/monitoring/worrying about them. Most people just want to work on their app not manage Nginx configs.

There is absolutely a tradeoff (as with everything in life) but in the context of this thread talking about scale with 100s of millions of requests, gigabytes of bandwidth and large spikes - it's far better to just host your own load balancers.

Most people (and apps) likely won't hit this scale so ELB is just fine. If you do though, ELB is just pricey and not really that great.

Link to the documentation? I thought this was changed over a year ago to not requiring pre warming?

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact