What's unfortunate is that in the first day after setting up the elb we didn't have problems, but soon after we started getting reports of intermittent downtime. On our end our metrics looked clean. The elb queue never backed up seriously according to cloud watch. But when we started running our own healthchecks against the elb we saw what our customers had been reporting: in the crush of traffic at the top of the hour connections to the elb were rejected despite the metrics never indicating a problem.
Once we saw the problem ourselves it seemed easy to understand. Amazon is provisioning that load balancer elastically and our traffic was more power law than normal distribution. We didn't have high enough baseline traffic to earn enough resources to service peak load. So, cautionary tale of dont just trust the instruments in the tin when it comes to cloud iaas -- you need your own. It's understandable that we ran into a product limitation, but unfortunate that we were not given enough visibility to see the obvious problem without our own testing rig.
GCE's load balancer does not use independent VM instances for each load balancer, instead balancing at the network level. So you can instantly scale from 0 to 1M req/s with no issues at all.
ALB pricing is strange too, classic AWS style complexity.
Google runs a single gigantic distributed loadbalancer, and simply adds some rules for your specific traffic to it. All of the compute and bandwith behind this loadbalancer is able to help serving your traffic spike.
GCP LB design details: https://cloudplatform.googleblog.com/2016/03/Google-shares-s...
One the major down sides of Amazon AWS.
What I think you are indicating is that you have a very unusual thing that ELB is not set up to handle: you go from base to peak load in seconds flat. Or even less? That's interesting and quite unlike the very common model of human visitors to a website ramping up, that ELB is likely designed around.
My biggest issue with ELB is how long it takes for the initial instances to get added to a new ELB.. it takes f-o-r-v-e-r... I've seen it take as long as fifteen minutes, even with no load. I'm hoping ALB fixes that.
The vast, vast, vast majority (seriously, probably 95-98%) of companies do not build out the required AWS infrastructure to remain highly available, with failover, with on-demand auto-scaling of all services that would make AWS the go-to choice. I continue to come across the individuals who maintain the fantasy that their business will remain online if a nuclear bomb wipes out their primary data centre. Yet they all deploy to a single availability zone, the same way you'd deploy a cluster of servers anywhere else. I cease to be amazed at businesses that spend $10k+ a month on AWS that would cost them half that with a colocated deployment.
- About a month ago, our database filled up, both in space and IOPS required. We do sizeable operations every day, and jobs were stacking up. I clicked a couple buttons and upgraded our RDS instance in-place, with no downtime.
- We were going through a security audit. We spun up an identical clone of production and ran the audit against that, so we didn't disrupt normal operations if anything crashy was found.
- Our nightly processing scaled poorly on a single box, and we turned on a bunch of new customers to find that our nightly jobs now took 30 hours. We were in the middle of a feature crunch and had no time to re-write any of the logic. We spun up a bunch of new instances with cron jobs and migrated everything that day.
100% worth it for a small business that's focused on features. Every minute I don't mess with servers is a minute I can talk to customers.
We don't have to bother ourselves with managing SANs, managing bare metal, managing hardware component failures and FRUs, managing PDUs, managing DHCP and PXE boot, managing load balancers, managing networks and VLANs, and managing hypervisors and VMs. We don't have to set up NFS or object stores.
Being on a mature managed service platform like AWS means that if we want 10 or 100 VMs, I can ask for them and get them in minutes. If I want to switch to beefier hardware, I can do so in minutes. If I want a new subnet in a different region, I can have one in minutes. There's simply no way I can have that kind of agility running my own datacenters.
Nobody disputes that AWS is expensive. But we're not paying for hardware or bandwidth qua hardware or bandwidth - we're paying for value added.
I still think the benefits of AWS are over-emphasized within most businesses. Of the 4 companies I've worked for that used AWS, 3 of them did absolutely nothing different than you'd do anywhere else. One-time setup of a static number of servers, with none of the scaling/redundancy/failure scenarios accounted for. The 4th company tried to make use of AWS's unique possibilities, but honestly we had more downtime due to poorly arranged "magical automation" than I've ever seen with in-house. I suppose it requires a combination of the AWS stack's offerings and knowledgeable sysadmins who have experience with its unique complexities.
Disclaimer: I'm a developer rather than a sysadmin, not trying to justify my own existence. :p
We started Cronitor as a side business and we've grown by seeking out the highest leverage usage of our time. My cofounder wrote a blog post about this https://blog.cronitor.io/the-jit-startup-bb1a13381b0#.kwbpma...
Also I'll add here to another point made below: I don't blame the ELB for not being built to handle our traffic pattern, despite the fact that websites are probably a minority on EC2 vs APIs and other servers. My specific critique is that none of their instrumentation of the performance of your load balancer indicates to you that there is any problem at all. That is... unfortunate.
The appropriate word to describe 8 requests/s is "nothing". Health checks and monitoring could do that much by themselves when there are no users.
200 requests/s is a very small site.
To give you some point of comparison: 200 HTTP requests/s could be processed by a software load balancer (usual pick: HAProxy) on a t2.nano and it wouldn't break a sweat, ever.
It might need a micro if it's HTTPS :D (that's likely to be generous).
To be fair, I hardly expect any performance issues from the load balancer before 1000 requests/s. The load is too negligible (unless everyone is streaming HD videos).
All the answers about scaling "ELB" are nonsense. There is no scale in this specific case. The "huge" peak being referred to would hardly consume 5% of a single core to be balanced.
I used to criticize ELB a lot and avoid them at all cost. So do many other people on the internet. But at your scale, all our hatred is irrelevant, you should be way too small to encounter any issues.
N.B. Maybe I'm wrong and ELB have gotten so buggy and terrible that they are now unable to process even little traffic without issues... but I don't think that's the case.
* ALB: Application Load Balancer
* ELB: Elastic Load Balancer
I have seen Application Elastic Load Balancer/AELB, Classic Load Balancer/CLB, Elastic Load Balancer (Classic)/ELBC, Elastic Load Balancer (Application)/ELBA.
In any event, I think it is great that AWS is bringing WebSockets and HTTP/2 to the forefront of web technology.
At a previous employer, we punted on ever using ELBs at the edge because our traffic was just too unpredictable.
Combining together all of the internet rumors, I've been led to believe that ELBs were/are custom software running on simple EC2 instances in an ASG or something, hence being relatively slow to respond to traffic spikes.
Given that ALBs are metered, it seems like this suggests shared infrastructure (binpacking peoples ALBs onto beefy machines) which makes me wonder if that is how it actually works now, because it would seem the region/AZ-level elasticity of ALBs could actually help the elasticity of a single ALB.
If you don't have to spin up a brand new machine, but simply configure another to start helping out, or spin up a container on another which launches faster than an EC2 instance... that'd be clutch.
Waiting for AWS to embrace IPv6.
It seems like the number of customers asking for a feature is more important than the size of those customers in my experience.
EDIT: And while you're listening: AWS documentation is a mess in the sense that it's way too unorganized; it might be documented but one cannot find it easily.
> Starting June 1, 2016 all apps submitted to the App Store must support IPv6-only networking.
(To configure an ECS service to use an ALB, you need to set a Target Group ARN in the ECS service, which is not exposed by CloudFormation)
We're using CloudFormation and ECS heavily for Convox, and just can't get off a CloudFormation custom handler (Lambda func) for managing ECS task definitions and services for small reasons like this.
I excitedly set up an ALB as soon as I read the post, because I've needed it, only to find that support for what I want isn't available to me yet!
Weird thing to highlight when the product being announced doesn't even have that feature.
>5 connections/second with a 4 KB certificate, 3,000 active connective, and 2.22 Mbps of data transfer.
"2KB certificate" and "4KB certificate"? Is this supposed to read "2048 bit RSA" and "4096 bit RSA"?
Edit: On a second read, it's less clear if header based routing is actually available yet...
"each Application Load Balancer allows you to define up to 10 URL-based rules to route requests to target groups. Over time, we plan to give you access to other routing methods."
The ALB clearly has technical access to the headers, but use of them isn't exposed to users yet.
I guess the tradeoff is that with ELB/ALB, like most PaaS, you don't have to "manage" your load balancer hosts. And it's probably cheaper than running an HAProxy cluster on EC2.
But for the power you get with HAProxy, is it worth it?
Does anyone have experience running HAProxy on EC2 at large scale?
I have swapped out ELB for HAproxy and/or nginx on a couple of occasions. If you know your load and feature requirements intimately, you might be able to do a better job. But it's work.
Nginx was a cluster of machines that did routing based on rules into the ec2 machines. Now that the AELB has some of those capabilities it's time to evaluate it.
The hourly rate for the use of an Application Load Balancer is 10% lower than the cost of a Classic Load Balancer.
They frequently introduce new features while cutting costs.
Good thing ELB is still here, so you can choose between them depending on your workload.
 LCU - Load Balancer Capacity Units
That said, I don't dispute that there might be use cases where classic ELB is a better option. And I'm glad it's still available (as opposed to ALB replacing classic).
I was trying secure an API Gateway backend using a client certificate but found ELB doesn't currently support client side certificates when operating in http mode.
There was this complicated Lambda proxy workaround solution but I gave up halfway through...
We have ALB working already.
Disclaimer: I work on Convox.
This ALB announcement + the nicer ECS integration could tip the balance though.
Any thoughts on how likely it is that Kubernetes can/will take advantage of ALBs (as Ingress objects I suppose) soon ?
And there is support for wildcard certificates, *.example.com
You can request a cert through AWS Certificate Manager with multiple names, more info https://docs.aws.amazon.com/acm/latest/userguide/gs-acm-requ...
This is not using Server Name Indicator / SNI.
Yes that's not possible as EV certs are not issue for wildcards.
My counter is that EV certs are for chumps and the entire concept is a scamola. The only justification I'd accept for getting one is proper A/B testing that an EV cert lead to increased revenue. There's no inherent security argument for them.
For a single connection the websocket will always go to the same back end regardless of sticky sessions being enabled.
If stickyness is enabled, and the same client creates a new websocket, it will go to the same back end as previous connections.
Did you have to do anything special to get this to work?