AWS is fine, as long as you don't rely too much on the web interface.
The AWS web interfaces are inscrutable, clunky, and slow. A lot of the trial and error you want to do to figure out how things work in AWS is very difficult to do in that interface. Often, the interface isn't even a win: it will just dump you into a textarea where you have to fill in some structured command-line style text anyways.
Instead of trying to deploy through the AWS web interface, build some simple tooling (use the AWS cli, or Python boto, or whatever) and then do trial-and-error using those tools. A lot of things in AWS make a lot more sense when you work directly with the API, rather than the web interface.
I usually click around in the web interface to learn what the options are, then automate the same procedure using a boto script. It's not trivial, but then you never have to do this task manually again. Also, I find the scripts work wonderfully to document the company's IT infrastructure.
AWS success is largely from startups and the ability of Amazon to make it easy/fast to setup, yes it can get complex like anything but by default it is pretty simple. They do have many, many services but you only really need a handful or a couple depending on the project. Maybe they need to separate out those huge lists of features/systems into buckets of simple, personal, small to medium, enterprise etc.
Look at the pricing page, it's ridiculous for someone new:
I have to google what "ECU" is, I have to figure out what "Variable" ECU means, I have to figure out what "EBS Only" means. There's not even a tooltip to explain any of that is.
Even after googling "ECU", and going to Amazon's FAQ page, it's still not clear exactly what I'm paying for.
And that's just the purchasing part.
The situation is analogous to "base-load vs peak-load" on the power grid. Base-load power (reserved instances) are very cheap but cannot respond to transient spikes in the load. Peaking plants (on-demand instances) can respond to transient spikes quickly, but are expensive to run.
A naieve model is that you find your minimum usage and buy enough baseload to cover that, with peaking beyond that. However, since peaking is so much more expensive than baseload this is not necessarily optimal - instead, you may want to buy some extra baseload that isn't fully utilized during your off-hours, because that offsets expensive peaking capacity during your high-demand hours.
You almost certainly would not want to have zero reserved instances.
If you're paying for X1 servers for an extended period of time on an hourly basis, you should be entitled to a thank you note from Jeff Bezos.
Recently, though, I tried setting up a simple home "VM lab"—basically a two-node cluster of regular workstations, running bare-metal hypervisors (e.g. Xen, VMWare ESXi, etc.)
When you do this, suddenly everything IaaS providers do comes into stark clarity.
For example, you realize that with even two nodes of scale, that sticking a lot of storage into each VM node is expensive, especially if you're not going to use it all on each node; it's a lot cheaper to have diskless VM hosts (save for an SSD for swapfiles), and stick the disk volumes on a SAN, connected to the VM hosts by something like iSCSI. (This then gives you other benefits, like disks that are copy-on-write clones of other "template" disks on the SAN; the ability of VMs to "fail over" to another VM host (and thus the ability of the host to gracefully restart without killing its VMs—at least, from a user perspective); "free" snapshots with rollback; etc.)
EBS is exactly such a setup—the perfectly-sensible setup for any VM cluster operating at scale. It's the other setup—local disks in the VM hosts—that's insane and doesn't scale; everyone is just familiar with it because most people's experience of providers is with the ones just getting off the ground who are before the ROI intercept.
Or, for another example, when running a heterogenous VM cluster, you immediately realize that the unit that the VM CPU resource commitment is measured by on a VM host is a "vCPU"—which is not exactly one reserved physical core (since it can just represent a hyperthread, and can be overcommitted when other VMs aren't using their resources) and is not the same on each host, since each core operates at a different speed.
The closest you can really get to getting a useful unit of CPU allocation, then, is by rounding off the host differences to integer multiples—such that you could say that one vCPU on host A is (roughly) equal to two vCPUs on host B. That unit, whatever you call it, will then scale up over time, as you phase out previous generations of VM host hardware. That's an ECU.
Smaller hosting companies can get away with telling you what CPU you'll be running on, because smaller hosting companies are new enough to have homogenous hardware, and small enough to not attempt anything like live VM migration. A company that has both heterogenous hardware and must migrate VMs across that heterogenous hardware can't make any such claims.
"I can handle basic Linux devops. But AWS is not Linux
devops. AWS has become its own specialty. With over 70
services offered, only a full-time AWS specialist can
properly understand how to build something in AWS."
AWS documentation is absolutely terrible (incomplete, out of date, etc.), the APIs are often inconsistent or confusing, official SDKs are half-baked (weird error messages, no error messages, etc.), and there are tons of overlapping products.
If Amazon actually made some effort on making AWS more user-friendly, it wouldn't be nearly as time-intensive for a small team to figure it out.
We are now looking for a devops person simply because the AWS stack in its entirety is daunting. Yes, from experience it is possible to manage with just the software guys. However there's just too much going on. The more you use of AWS, the more difficult and demanding it becomes.
For all that was said, we don't regret moving to AWS (were previously on Rackspace).
You might find BOSH useful at this level. It was developed to deploy and upgrade Cloud Foundry, which itself is a complex distributed system with heavy HA requirements. It's a large part of what has made it possible to run CF on OpenStack, vSphere, AWS, Azure and GCP. (There are more coming).
There's a hump to get over, but once you do, BOSH is essentially magical. At Pivotal we upgrade our entire public cloud every week or two and mostly, nobody notices.
Disclaimer: I work for Pivotal, we do almost all the core engineering on BOSH. Microsoft and Google do some too for Azure and GCP respectively.
Not doubting your decision, just curious for more info.
AWS has a lot of redundancy baked in. The notion of "replace, don't fix" also made our codebase more resilient to failure. This is also true for most cloud suppliers.
With AWS we don't need a PostgreSQL admin, it's managed. Redis is managed. Storage is cheap and can be replicated across the globe. So far we've reaped all the benefits without a devops person in-house.
It was already expensive.
Terraform was what made AWS possible for us. I would not have used the GUI.
Build for redundancy, replace resources when they fail (rather than necessarily trying to fix them), and you're good. AWS has some learning curve in the form of its ACLs, APIs, and how the pieces fit together, but the result is that you have excellent fine-grained control over your stack.
You can outsource that knowledge to a platform provider, but you're gonna pay for it. Rackspace won't even sell you cloud hosting without a service contract.
I call bullshit. Startups have/need/want experienced and skilled devops engineers. AWS launched in 2006: there are available, talented devops engineers who understand it.
The devops needs of small startups aren't "minimal". Even if you go with a devops engineer not intimately familiar with AWS, he or she can get up to speed on AWS platform operations just as quickly as learning any other new stack the startup may be using... If you're planning on hiring for the lowest common denominator for devops in your startup, you'll be in trouble.
But I will mention that we were not always on AWS. Indeed, I hired Chris at a moment in our company's life when we were growing a bit too fast and still had a "snowflake server" setup in Rackspace Cloud. Served us well for the first year of revenue, but not so well given our growth rate. One of his first jobs was to move us from snowflake to proper config management. Then we made an 18 month pitstop in colo before adopting AWS.
I think what the blog post fails to recognize is that there is "commodity AWS use" and then there's "AWS platform use." For example, at Parse.ly we use Route53, Cloudfront, ELB, EC2, S3, RDS, and EMR.
The first 5 are basically commodity services. DNS, CDN, load balancer, VMs, backup.
Whether you use Rackspace, DigitalOcean, or your own colo, you'll need to figure something out in each category to deploy a real webapp. (Or, accept the devil's bargain of a PaaS.)
The last two services, RDS and EMR, are basically AWS-specific value-adds that saves our team time. We used to run our own Postgres EC2 node, but meh, why bother. Our SQL DB is small, and RDS handles a lot of things for us. We also used to run our own Spark clusters (using spark-ec2), but in June 2015, Spark got EMR support, so again, that saves us time/money. We actually breathed a sigh of relief when that came out, since we were essentially building Spark EMR ourselves with boto.
Spot instances and reserved instances are very neat cost optimizations at scale. But complex.
We adopted RDS/EMR hesitantly because of the "lock-in" they represent. We could definitely run without them. I don't think I have interest in the other 70 AWS services, mainly because I hate lock-in, I like open source, and I want to Keep Things Simple. (So Chris stays sane!)
AWS is definitely more complex than DigitalOcean. But I think running production web apps in the cloud is "simply complex", no matter which provider you use.
I also think the OP has a good point, which amounts to, "premature scaling is the root of all evil". Parse.ly may run on 200 EC2 nodes across multiple regions & AZs today, but our first prototype ran in a single 1U rackmounted server I built myself and snuck into a friend's server cage (2009). It's important not to waste time on "devops" when you are still figuring out what the heck your startup is even supposed to be doing. If I wasted time on that in the early days, we simply wouldn't be here.
DevOps has become "developers doing operations" but thats a misapplication of the term. DevOps started as Developers and Operations working together.
If your running this many EC2 nodes, and you don't have an Operations person or team, then your running on the razors edge.
>At the time of our conversation I had crashed an AWS instance and we were having trouble fixing it. Among the points I made to Sean was that the problem was specific to AWS. If we’d been in the Rackspace cloud, I simply would have rolled back to yesterday’s image. A problem that would have taken 5 minutes to fix instead dragged on for 2 weeks.
Honestly you have already been given the sign post that this could go very badly for you. Imagine the case where "crashed instance" was expensive critical infrastructure. How many hours or days of outage before your business is irreparably harmed?
Lambda isn't, uh, quite as fully featured as Heroku or CF yet. And it ties you very closely to AWS. With Heroku you have a reasonable shot at migrating to Cloud Foundry or vice versa.
Disclaimer: I work for Pivotal, we're the major donors of engineering to Cloud Foundry. We sell a commercial distribution (PCF) and have a public service (Pivotal Web Services).
Yeah, spinning up an Elasticache instance with Redis has some complexity...it's a managed service. You have to tell it some things for it to be managed for you.
If you don't want to learn about it, you just spin up an EC2 instance (which itself is not really any more complex than spinning up a VM in any other virtualized platform), and install Redis onto it. If your experience is in Linux sysops, you can -totally do that-.
That's really the key thing about most of these cloud providers...the barrier to entry really is about learning how to spin up a basic VM. All the other offerings, as complicated and unintuitive as they may be, are opt in things, designed to solve common needs. You want a scaling NoSQL DB? You can spin up EC2 instances and install your favorite flavor. It's then on you to ensure it scales properly. Or, you can read up on Dynamo, determine whether it fits your needs, then do the grunt work of setting it up in AWS, and let Amazon manage it for you. Etc. Complaining that taking advantage of ~all~ the offerings is a huge mental burden seems silly; of course it is. THERE ARE SO MANY OF THEM. But nothing is stopping you from just using EC2 to start, and using those other services only as you find the time and drive to learn about them and determine that you can benefit from them.
For those of you saying, "it makes getting started dead simple and gives you room to grow," you're also missing the point. Something like Heroku or LSQ is probably a better option for 90% of such use cases.
For those of you saying, "oh, just use Amazon ABC for x feature, DEF for y, GHI for z, etc. and tie it all together with Amazon foo," you're also missing the point. Yes, that's the proper way to use AWS, but it requires very specific (i.e. not common Linux sysadmin) knowledge that early-stage startups may not possess and may not have the resources to acquire. This is the main takeaway from the article that I'm not sure many folks actually read...
Oce you do learn their names, comes the inconsistencies. Why are Dynamo and Redshift not part of RDS while Aurora is? There's just no rhmye or reason to any of it. No unifying vision. Someone at Amazon needs to take the time just to organize it into logical collections.
> Why are Dynamo and Redshift not part of RDS while Aurora is?
Because Aurora is a relational database (see above), and Dynamo and Redshift are not?
Of course, it would seem that some marketing weasels managed to get into the organization; new names are fizzy opaque brands instead of the old descriptive-ish naming.
Dynamo and Redshift are database services, but they're not much like RDS.
Of all the AWS database services and permutations, and there are many, RDS is the only one you really should know.
My perception of AWS is that it is an infrastructure environment onto which you can build your own hosting environment. AWS isn't just designed for web based applications, even though most of the AWS users will use it for that purpose. It's a box of parts with which you can piece together the environment you need and that requires skills, knowledge and a good grasp of how it all works (on an AWS specific level).
Another issue I keep running into is that developers use some of the AWS services as integral parts of their application, SQS is a great example of that. I'm convinced that SQS is not for developers to use on an application level but it is a service that allows infrastructure engineers to build complex and scalable environments on AWS.
The other issue I always struggle with when using AWS is the cost thing. You never quite know how much you're going to pay at the end of the month. For small startups the difference between $500 and $1,000 is massive whereas for larger organisations the the difference between $15,000 and $20,000 isless of a problem.
Convox (YCS15) is addressing this head on and helping lots of small teams immediately get the best out of AWS.
It's open source, built by a core team and a growing community of AWS specialists.
It fully automates AWS setup and deploys and infrastructure updates so you and your team don't have to.
Disclaimer: I work at Convox full time
I have some static pages that I used to host on a shared hosting service. I moved them to S3 years ago and haven't looked back. Zero sysadmin hassle, and the cost is actually less than I was paying for the shared hosts (which would invariably fall over if I got a link from a high-traffic site).
Edit: yes, at some traffic level it's going to become more economically efficient to run your own server, but that level is pretty damned high. Note how many gigs of S3 traffic you can purchase for the cost of one sysadmin salary (hint: a lot of them). And if you're doing it yourself (the "small team" that the article is talking about) your time is probably better spent working on your product than on spending your days monitoring security mailing lists and screwing around with apt-get or whatever.
That would be a total vendor lock-in architecture, but I'd never have to think about configuring redis clusters, failover, backup or anything. I would have no idea what technology stack is below it. It scales, and it just works. The only devops work would be to create a deployment script that would automate a deployment from sourcecontrol. The initial setup of the whole infrastructure would cost a few days maximum, and none of the hard operations problems would be mine. I'd click boxes like "Geo redundant availability: yes/no".
I think that's the benefit of these expensive cloud services. If you do low level stuff yourself on expensive cloud providers like AWS or Azure, you are doing it wrong.
Still, I was able to get my first app up and running on Elastic Beanstalk and a MySQL database up via RDS without much trouble. It's not rocket science.
Given the breadth and depth of available services, I can't imagine how you can declare "AWS is inappropriate for small startups" with a straight face.
I cannot understand how any half-competent startup dev team would find it difficult, most of it can be done with point and click UI, especially for setting up your basic load balancer + web servers + database stack.
(No Perl, Python, Ruby, Go, etc.)
I tried this when I first experimented with AWS after I read the story behind it, i.e., the directive Bezos allegedly gave to disparate groups within Amazon to make their data stores accessible to each other.
The AWS documentation claimed everything could be controlled via HTTP. Great. I know HTTP. Sign me up.
I have no trouble interacting with servers via HTTP using the Bourne shell and UNIX utilities, without using large scripting languages. I have been doing so for many years.
But after a few hours trying to get AWS to work using UNIX it was such a PITA I gave up. And I do not give up easily.
But it turned out there were small errors in the documentation, so even if one followed their specification to the letter, things still would not work.
The Amazon developers in the help forums would just say use the Java programs they had written.
Of course AMZN had a "web interface" from Day 1. But I have little interest in another hosting company with a web GUI.
At the time all Amazon offered for anyone interested in the command line was Java. Installing OpenJDK and a hefty set of "Java command line tools" just to send HTTP requests? This did not inspire confidence.
Then came Python. Everyone loves AWS. How can anyone criticize it?
I concluded that if AWS was well-designed (according to Bezos alleged directive) then it would be possible to interact with it without having to use a large scripting language and various libraries.
I guess I am either too stupid or I set the bar too high.
AWS, as I understood it back then (before the massive growth), is a wonderful idea but I am not sure the implementation was/is as wonderful as the idea.
Oh yes, to squeeze the most out of it, I am sure it does.
But you can get along with EC2, Route 53 etc. pretty fine.
Personally found the multiple terms, docs (outdated) and other stories of people being billed lots pretty intimidating but when it came down to it, it wasn't as scary as I made it out to be and I've set things up at about half the cost of Digital Ocean and others.
Any recommendations? Have spoken with LogicWorks, 2ndWatch and RackSpace so far.
Though the boys there also have experience with many systems.
Disclaimer: Previously employed there.
I'm technically proficient, but I want easy.
For microservices architectures, having just gone through this, my choice was Google Cloud with Google Container Engine (Kubernetes). You build Docker images, deploy them, specify how those resources are connected, how many copies of each container should be deployed, etc.
I didn't have to mess with Terraform, Chef/Puppet/Salt/Ansible, deploying Kubernetes, Swarm, or any other container management system. Because it's just Kubernetes, it's easy enough to bring on-premise or deploy on AWS.
The main draw toward Amazon are all of the other services that they offer, of course. I see a lot of users of Kinesis, Redshift, etc. If you need them, then you need AWS, but deploying and managing your own apps brings many more barriers, IMHO.
My workflow for deploying a new app for my own projects is to cd into the source directory and type
Interestingly, I used to be much more bullish on moving from the buildpacks model to a container model a la everyone else in the PaaS space. To the point that I presented a case internally that buildpacks should essentially be pushed aside in favour of that model.
But once I started actually writing Dockerfiles I changed my mind. It's a bloody inconvenience. Composition is not really A Thing in Docker, you can only stack. For all the talk about elegance (and it's elegant as an implementation), you're left with what is essentially a single inheritance hierarchy with no clean way to compose containers.
The ability to get your app running immediately is something Heroku pioneered and Cloud Foundry continued (although, OK, if you insist on hurting yourself that way, you can push docker images too). Once you hand off the frankly boring business of automatically building a container to the platform, containers as the unit-of-deployment lose all of their magic.
But again. I'm biased. I've only seen both alternatives up close in my day job and in my personal projects.
Ingestion is either timed jobs in lambda or an elastic beanstalk set of processes doing the same. I'm not sure what the ingestion is so I can't say much here.
Data gets tossed on a kinesis queue. If possible have the data being ingested tossed on the kinesis queue. Hook lambda workers to the kinesis queue or an elastic beanstalk layer. Kinesis here removes the need to manage a redis cluster. One layer of devops removed.
Next you have the workers, the easiest is to toss them in lambda... they'll be invoked to read from the kinesis stream. They'll scale up and down. You don't need to run a supervisor, it's baked into the system. Removing lots of dev ops here, no complicated deploys, no cluster management, etc. If you need 2 different sets of workers you can set a split join set of kinesii... or I believe you can have 2 readers from a kinesis stream they just don't like it if you have lots.
[Another variant is just a cluster of Elastic Beanstalk hosts scaling on the side of the kinesis stream]
They'll shove data into mongo db... you keep this because you want it. Otherwise use RDS or dynamo if it's suitable. I can't tell.
Of course you pay for some of this removal of dev ops. Some of it is free. Lambda limits some of your language choices and is mostly stateless so you may hit some services for things like lookup harder.
But the code that is running on lambda can get moved into worker fleets managed by you later if you need more control.
The point is here that the interesting parts of your app become the actual code dealing with your specific problem not the devops. Eventually if you have to, you build up more devops and more custom solutions to handle different scale up to meet the scale or price part of the problem. But the solution above will go bigger than most people need and lets you not have to spend hundreds of hours on all that devops at a real cost of engineer time.
There are costs everywhere, money, time. Just decide how you want to use them.