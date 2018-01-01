> Cloud solutions like elastic load balancing are generally engineered for the average use case. Think about the ways you are not like the average.
This might be the case for some things, but many of Google Cloud's "elastic/serverless" products (Load Balancers, App Engine, Pub/Sub, BigQuery, Datastore, etc) can truly scale up from nothing to huge traffic spike without the need for pre-warming and things like that.
Also, networking on GCP is vastly superior to AWS. The VPC is setup with sane defaults and VM instances can communicate privately to other VM instances in any zone or region without needing complicated constructs such as VPN's or NAT instances.
Lastly, Google Cloud is generally cheaper (sustained use-discounts), higher performance, and takes all the lessons learned from AWS and applies them.
The one knock I have on GCP is billing. It is incredibly obtuse. Even the monthly billing invoices I still have a hard time digesting (especially if you have multiple projects under a single billing account).
Really? We've found it to be much easier. The billing section has up-to-date current breakouts and a simple list of transactions for each billing account. There are budget alerts and the best part is the easy export into bigquery which then gives you SQL access for all of it. Add in datastudio and you have BI style analysis and visualizations too. Better than anything else the other clouds have.
Video from GCP Next: https://www.youtube.com/watch?v=qL8kvTb4RbU
By the way, it looks like ALB might be based on nginx, based on the error pages. I wonder if classic ELB is based on HAProxy.
ugh vendor lock-in is the worst :)
Where I work we are saving ... uh ... a lot of money by migrating our workloads over. And they're faster. It's awesome.
Hopefully GCP will start pushing AWS to lower their prices. They charge so much.
Disclaimer: I work for GCP.
Contrast with Google, you start with the confusion that there are two products (cloud endpoints, apigee), and neither really talks about integration with Cloud functions.
That said, it depends on your design/architecture as to whether that is appropriate or not. Sometimes you want that distinct separation. But if that's the case, you should also probably not even be thinking about that low latency case, because you should treat it as a service outside of your control, that may be (and probably should be) cross regionally available (so you might even be crossing the country, for all you know. This presumes the inclusion of latency based routing with a custom DNS, of course, but for a standalone, client facing service, that's what you should be doing anyway).
Obviously there may be a middle ground you want, but I'd say you probably either want to treat it as "your" API, and thus, you can be as pragmatic as you want when accessing it, or you want to treat it as a reusable, completely separate service, that you are a client of, in which case things like location can not be guaranteed, and thus latency is a reality.
It's common enough that when I asked Amazon about it, they already had enough requests that it's on the roadmap.
People use api gateways for all sorts of things like orchestration, tokenizing sensitive data, caching requests, etc, that should have the same interface whether used internally or externally. Sure, I could recreate all that functionality for internal use, but that's one of the supposed benefits of an API gateway...turn what was code into configuration. See products like Apigee, Kong, Tyk, etc.
But, I -did- mention there were caveats. Your correct in tha I should probably not have led with "You probably shouldn't", and -then- gotten more nuanced, though. While I agree that optimizing it to not leave the AWS network would be of benefit, if you reach the point you're calling it via the API Gateway internally, it's a separate service, and if it has -any- high availability requirements, or ever will, you should not be reliant on it not leaving the network (since even without going multi-region with distributed DNS, it may be handled by another AZ, introducing latency).
Products that compete with Amazon's API gateway can do it, though you're right that their product cannot.
Edit: There are also AWS customers that use a different CDN...Akamai, Cloudflare, etc. Decoupling API gateway and Cloudfront would have benefit for them.
Integrating API gateway into our deployment took a little bit of work, but we're happy to have it part of our automated deployment process. We push updates of our swagger spec with AWS CLI to a separate API per environment (dev, staging, and prod-west, prod-east). Each of environment has one API gateways with several stages that we use for green/blue deployments. The stages are green, blue, and public. We deploy to the inactive color (green or blue) smoke test and promote that stage to public. This simulates the same process we do in production and the swap to public is akin to our DNS swap from one side to the other. We were very happy that it wasn't a huge time investment to fit this workflow into our deployment process.
Somehow we missed the fact that because API gateway was tied to CloudFront that it did not support multi regional failover. If we have a service outage in our primary region we swap all of our traffic with DNS to our secondary region (from us-west-2 to us-east-1). In order for us to do a regional failover with API gateway we would need to deprovision the endpoint api.domain.com in the failing region and provision api.domain.com in our failover region. I'm expecting if we have to do a regional failover there is a very good chance that the request to deprovision the endpoint will likely fail. See my AWS forum reply and other's asking for the same feature: https://forums.aws.amazon.com/message.jspa?messageID=761925#...
My team is very much looking forward to this functionality. We have duplicated our infrastructure across multiple regions to mitigate a region outage. We were planning on relying on a dns change to route traffic to a new region.
With the limitations that you have described, our failover process is much more complicated and uncertain.
Given the two regions primary (us-west-2) and secondary (us-east-1). We will need to attempt to disassociate our api.domain.com from primary API Gateway and wait for that operation to complete. After that we must associate our secondary API Gateway with api.domain.com. Its very possible and likely issues on primary that caused us to fail over to secondary may prevent the disassociating from the primary. This is an extremely undesirable side effect as it could leave us with unhandled requests or error responses to api.domain.com.
I wanted to post this here for others suggest alternative workarounds or ideas. We will also follow up with a support ticket.
Curious as to why you didn't opt for RDS for the database side of things? I actually have about 5 side projects as well as my main SaaS, and they all share the same RDS instance for the database which saves me cost while keeping performance high.
Also, do you use Lambda much at all for periodic scripts that need to run? I've more recently been building more and more mini applets on Lambda for things like health checks or replicating data across Amazon services etc. with good results.
Also, and this might be a good complimentary service to your monitoring service, but I've been using CloudWatch's timed tasks a lot recently to trigger some of those Lambda instances - very quick and easy. I've been using another service to monitor and report on missed triggers, but will look into Cronitor a bit more as a viable option.
The general suggestion, from AWS themselves, is to try and fit things into S3 first, and only go for RDS (or DynamoDB) if it makes sense to. Barring a clear indicator, S3 will almost always be cheaper, and has no scaling concerns. That's also been our experience; defaulting to S3 has been the cheapest, and required the least maintenance from us, and only when we come up with a good reason (need for complex or high performance queries, generally) did it ever make sense for us to reach for RDS or DynamoDB.
What does S3 have to do with his question?
That's probably fine for lots of stuff, but not everything.
Minutes is a really long time to go down.
If you actually have an in-house solution to providing better availability than amazon's RDS options, then I recommend you market your engineering solution to the major cloud providers as they will pay you very well to help them implement your solution to that problem.
Only all the time.
> I recommend you market your engineering solution
I saw exactly that company at the last re:Invent. Can't remember what the name was...
> to the major cloud providers as they will pay you very well to help them implement your solution to that problem.
My guess is that most reasonably sized organization manager their own databases for uptime-critical systems (case in point). It lets them have more control over backups, when to upgrade the database, how to handle failover, etc.
That's the only way you can get a 10s failover.
I do know that AWS itself documents 60-120s for RDS. http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Concep...
The only thing I didn't like was "spend" in the title – "costs" or "expenses" would be less pretentious, to my ears and in my mind. This is a really trivial quibble! [And apparently this is a really old word in English anyways][1]. Now I feel like a grumpy old person.
[1]: https://english.stackexchange.com/a/79055/42347
You mention you use SQS to queue incoming metrics. What server/framework/language are you using to do this? IME using SQS requires tools with good parallelism to deal with it's high-latency/high-throughput performance characteristics.
[0] https://segment.com/blog/spotting-a-million-dollars-in-your-...
Moneymaking idea for someone: a devops person, a lawyer, and an accountant get together and charge companies to predict their cloud bill.
Now I'm not trying to show off or say we're doing things better. If anything, maybe this post can help me convince my cofounders that we should spend more on infrastructure hosting (there's lots of things we could improve). I'm mostly curious what's a typical ratio in other smallish startups.
As Shane mentioned in the post 12.5% has been a consistent number for us as we have scaled up over the past couple years. That said, at this point I suspect we will see this percentage decrease a bit going forward. I'm basing this on intuiting that we're a bit over provisioned at the moment and won't have to scale our infrastructure linearly with user growth over the next year or two. Of course, that remains to be seen...
To add some more numbers to the conversation my full time job is CTO at a consumer facing tech company (Babylist.com) I just looked at our spending on IT/infrastructure for the last couple of months and it's around 1.5-2% of revenue.
On a side note. When people talk about x-figures revenue, are they typically talking about x-figures monthly or yearly?
To give an example, I think we spend a similar amount or even more on customer.io (email marketing automation) than we do on our entire hosting. You could argue that this cost is a marketing cost rather than technology though. But currently we still attribute it to IT.
However when it comes to low-traffic side projects and experiments where costs of a few hundred dollars matter, I prefer Linode or DigitalOcean with ansible provisioning and B2 for block storage. It will cost you more time for sure, but it will give you better performance at the low-end which means you can break even sooner. If it takes off you can always migrate to AWS, GCP or Azure later.
The finest granularity I can find in the AWS web console is per day; many times I've butterfingered an input and only caught it a few days after the fact due to the unexpected bill increase.
Unfortunately, to make meaningful use of this you pretty much have to roll your own infra to download it, load it and analyze it. :(
No matter what graph format I choose (between line and stacked area), all I'm getting is a single dot at 00:00 AM today for $0.35, which is the correct amount for cumulative spending this month. But it doesn't seem to be updating or graphing it correctly.
With cloud providers pushing for Serverless, it will be even more darker, there is not a way to get control, Just hope that everything will be greener.
I will add, we've paid when we've had to do several manual db upgrades in that time:
- Original m3.medium on non-provisioned iops
- Upgrade m4.large
- Upgrade to an EBS with provisioned iops
- Upgrade to m4.xlarge
These are easy enough, but not nearly as easy as an RDS upgrade is.
Reserved instances have saved us a ton of money, even with my screw up last October.
You should only use T2 instances within an auto-scaling group.
Are there any other technical changes you've made to your application code specifically in response to AWS costs? Any other recommendations when designing a new application?
aws.amazon.com/well-architected
disclaimer: work for AWS and author
> Cloud solutions like elastic load balancing are generally engineered for the average use case. Think about the ways you are not like the average.
This might be the case for some things, but many of Google Cloud's "elastic/serverless" products (Load Balancers, App Engine, Pub/Sub, BigQuery, Datastore, etc) can truly scale up from nothing to huge traffic spike without the need for pre-warming and things like that.