Perhaps the first AWS mistake you might make is... using AWS? Even before I started at Linode, I thought it was terrible. It's extremely, unreasonably pricey. The UI is terrible. Their offerings are availble elsewhere. I started MediaCrush, a now-defunct media hosting website, on AWS. After a while, we switched to dedicated hosting (really sexy servers). We were looking at $250 a month and scaled up to millions of visitors per day! I ran our bandwidth and CPU usage and such numbers through the AWS price calculator a while ago - over $20,000 per month. AWS is a racket. It seems to me like the easiest way to burn through your new startup's seed money real fast.
Edit: not trying to sell you on Linode, just disclosing that I work there. There are lots of options, just do the research before you reach for AWS.
Yes, Linode beats AWS when it comes to price. On the other hand, Linode's offer is incredibly basic and simplistic. AWS offers service after service that Linode simply doesn't and realistically cannot.
Sure, you can emulate a subset of these services (and a subset of their features) using open-source software but at what price ? That's the major flaw in your reasoning. Getting to the point where your installations are as stable and reliable as AWS', given a large stress on the system, will cost you a lot of money and time. Directly comparing the cost of hardware access is ignoring other costs and headaches that are not very easy to estimate.
There's a market for a service like Linode and there's a market for a service like AWS. You've simply never worked on a project/system that works better on AWS than on Linode. I know I couldn't run the systems that I currently operate on Linode without multiplying the workload that is needed to maintain them.
Linode and AWS are competitors but there's space in the market for both; they simply fill different niches. Establishing one as absolutely superior to the other is silly and closed-minded. A lot of people chose AWS; go and ask them why (feel free to reach out to me at nick at nasx dot io - I'll be more than happy to talk to you).
If we keep attacking the people for their honest disclosures, we're only discouraging them from expressing their conflict of interest in the future. Please. Don't.
Using locust.io, I've seen that my current site on two $10/month Linodes can scale up to approximately 300k people/day and increasing that substantially just means I press a button and upgrade my app/database servers.
If it came to a point where I was growing at a pace I didn't want to manage and money was flowing in, and I was out of ideas on software optimization, only then I would consider spending tens of thousands/month on AWS.
I'm not denying the great benefits AWS gives, I honestly would love to use it now and just be done with most of my devop headaches, but the costs are prohibitive.
Picture a continuum between brain-dead simple websites and business-critical complex websites:
simple: static website, WordPress blog
moderate: small business CMS, etc
complex: Netflix, AirBNB
On the other end of the spectrum, you want to run a high-availability website with failover across multiple regions like Netflix. You need the value-added "services" of a comprehensive cloud provider (the "I" and "S" in "IaaS" as in "Infrastructure Services"). For that scenario, there are currently 4 big competitors: AWS, MS Azure, Google Compute Cloud, and IBM SoftLayer. However, many observers see that Google and IBM are not keeping pace with AWS and Azure on features so at the moment, it's more of a 2 horse race than a 4.
Keep in mind that the vast majority of cost comparisons showing AWS to be overpriced are based on comparing Amazon's EC2 vs bare metal. The EC2 component is a small part of the complete AWS portfolio. If you're doing more complicated websites, you have to include the costs of Linux admins + devops programmers to reinvent what AWS has out of the box. (The non-EC2 services.) Even if you use OpenStack as a baseline for a "homegrown AWS", you'll still need extensive staffing to configure and customize it for your needs. It may very well turn out that homegrown on Linode is cheaper but most articles on the web do not have quality cost analysis on the more complicated business scenarios. Anecdotes yes! But comprehensive unbiased spreadsheets with realistic cost comparisons?!? No.
I use it for dev / early projects and as things get complex or need more redundancy I make the production spend on AWS.
Just saying that calling 90% of Google Cloud's products to be far better than AWS (or any equivalent provider for that matter) is bold.
And no the ability to run it yourself on Google is not the same.
Here's a material answer - https://cloud.google.com/docs/google-cloud-platform-for-aws-...
In short, Google has parity with AWS on many fronts, and exceeds AWS on many others. Only material AWS advantage at this point is full IAM. Biggest thing you gotta remember is AWS is stuck in "VM" world and only slightly deviates from that. Google's advantage is in its fully managed services, which AWS does poorly (don't tell me Redshift is "fully managed").
Google has Bigtable, which AWS has no competitor for. Google's Pubsub is vastly superior to SQS. Google doesn't need Firehose because Google's services scale to Firehose levels without needing a new product and a new price. Google has Zync transcoder service.
Google has BigQuery, which is vastly superior to Redshift in price, performance, scale, and manageability. Google has Dataflow, which AWS has no competitor for.
Even for VMs, Google offers better networking, faster disks, more reliability (live migration), better load balancer, etc.
And to put the cherry on top, Google's no-committment, no-contract price is only beat if you lock yourself into AWS's 3-year contract.
(disclaimer: work on BigQuery)
You know what would be a cool idea: if Google developed a method of copying an Amazon Machine Image (AMI) to Google Cloud. That would give us an easy way to try out our same servers at Google without having to rebuild everything, as we do not use containers yet.
I think the net cost of an extra personnel (a server admin) in a startup team, is more than the cost of the server.
If anything, I see a lot of people choose AWS for 'scalability' concerns when they never end up needing to scale.
Main advantages I see is
Security - aws has security zones, FW ETC FOR FREE
Momentum - constant new features(apache spark, lambda, sqs)
The only points you even remotely bring up are vague and wishy-washy and could easily be applied to any statement, even outside of programming.
I consult / contract for a Fortune 50 customer that pays an insane amount of money to AWS per month and it seems ridiculous what they're paying for and what they get. But do you know what's even more ridiculous? Paying three times as much but getting even worse reliability and a massive amount of time wasted while waiting weeks to provision servers. One group internally used to have massive outages daily resulting in needing to hire a dedicated team of about 6 just to handle that load of production incidents (terrible reliability also resulting in loss of revenue too) and since migrating their software with even the most bone-headed SPF-everywhere AWS based software architecture haven't had more than one or two per week for outages now. The cost benefits get better the more incompetent / incapable your internal IT organization is. The amount of wasted resources due to internal bureaucracy, legacy, and working with companies that don't really practice any form of technology at scale has been an incredible amount of savings for my customer.
And knowing how much my customer relies upon AWS support for the most menial of tasks (primarily a cultural thing with how they treat their vendors as well as internal resources) I am sure that nobody but massive companies that deal with external bureaucracies effectively could deliver the size and kind of support team to handle the volume of support requests generated by my customer for the most ridiculous of tasks (I've seen executives file AWS support requests to reboot VMs at 3 am because the bureaucracy is extremely brittle here and reflects into the software systems reliability). We may throw millions / month at AWS, but Amazon has to hire at least a few dozen engineers and several account managers just to handle support alone which I might argue could be a net loss to handle the customer.
Most vendors that work with my customer have a very hard time working with them because they approach sourcing quite similar to how Walmart does and abuses them to the point where the vendors abuse their employees.
Isn't this falls outside amazon's responsibly area?
Atleast the documentation I read and a few aws promotional talks I attended seems to suggest the contrary.
Or I have understood completely wrong and amazon is now also providing managed IT services?
But when you pay AWS so much they are pretty beholden to your requests and can wind up becoming a lot like an MSP. I've spent countless hours with those poor guys waiting on a traceroute failure or for mtr or sat to give some anomalous event that everyone suspects is something wrong in AWS when I've insisted AWS is almost never at fault because our incompetence is almost certainly the problem.
The AWS support team's worst engineer is probably at least in the top 25% of our support engineers I'd argue so I'm sure if my customer had the option they'd want to pay to have them troubleshoot our own internal networks but that's what my team is for I guess.
Yes. There will always be a less stupid solution out there that is not as bad as the bad solution.
Or, you could fix IT. But that takes thought.
Easier to give money to a vendor you can blame...?
In a lot of pathologically-managed companies or divisions, there is effectively zero chance of an employee being able changing the pathology.
- It is too hard to figure out that you offer hourly billing, I found it by accident, and a year too late.
- I tried out your Nodebalancers, as I thought they would be a nice help for my stack, but I ended up building my own Haproxy solution since the performance wasn't good. When I pay for a dedicated loadbalancers, I would expect it to be top-tuned from your side.
- Your API is a mess, and so is your documentation. Everything is GET-requests. And creating a new instance, shouldn't take 11 requests, and a dance to set every little detail correctly. And to be fair, your documentation doesn't even document that, this is a subset of what I use: https://gist.github.com/kaspergrubbe/756f0a227db8aeb92818 I even managed to find some errors in the documentation when I went through it, so it didn't feel very polished or tested, but maybe it isn't used a lot. It also seems strange that to emulate one or two clicks in the interface, needs 11 requests through the API.
- Your advanced settings are extremely hard to find, especially "Edit Configuration Profile" and the "Auto-configure Networking"
- Elastic IPs. I feel like it should be easier to swap IPs around Linodes, without me using heartbeatd, and some strange configs. This is much easier with Amazon, and one of the reason why I prefer them.
- Anycast routing. Linode has several datacenters, it would be awesome if my setup in the US could have the same IP as my setup in the UK, because right now I am doing ugly DNS hacks to send my customers to the closest datacenter. Is this planned?
- Lack of an object store, even though I use Linode, I am still using AWS S3, which means added latency, and I have to pay Amazon for outside traffic, you can't even compare with AWS on these things without starting to have these things.
Can I setup a Linode through the API with private networking enabled?
I am a happy customer, and I loved your KVM upgrade, I love your prices, and I love your service. But it would be nice to see more innovations happening.
> It is too hard to figure out that you offer hourly billing, I found it by accident, and a year too late.
It's still a pretty new feature, only about 18 months old, and their price sheet has clearly listed e.g. ".06/hr to $40/mo" for quite a while now. I vaguely remember getting an email and seeing a blog post about it when it was announced...
> When I pay for a dedicated loadbalancers, I would expect it to be top-tuned from your side.
I think it is top-tuned, but only on $20/month worth of hardware and networking. ;-) They explicitly say in some blog posts to add more NB and scale out if you need more concurrent connections. I agree that this is a little alarming, and I'd be willing to pay more for beefier NBs—just like I do with regular nodes. Building out your own HAproxy nodes is a fine solution and what I plan to do if this becomes an issue for us.
> Your API is a mess... It also seems strange that to emulate one or two clicks in the interface, needs 11 requests through the API.
It's not the greatest API in the history of computers but it gets the job done and I found the documentation to be more than adequate. It's bare-bones and you're going to want an abstraction over it, whether you write your own (like I did), use their CLI, or use Salt or Knife or what have you. Whether it's one API call or 100 doesn't matter if all you have to do on a regular basis is `linode-create foo` or `ansible-playbook create-linode.yml bar`.
Anyway, I just ran through a quick build using the web interface and it takes like 10+ clicks to build a node with the default disks, no private interface, no label, no forward or reverse DNS, not booted, etc. For this to take four API calls to allocate the node, create root and swap volumes, and build the config doesn't seem like a big deal to me. Tack on another four API calls to set the label and display group, set the DNS, add a private IP, and boot the node. I'm not seeing the problem here.
> Lack of an object store
I mean, sure, Linode lacks a lot of the things that AWS provides. They also don't provide a database aaS or a queue aaS or a CDN aaS or... you get the idea. I don't think they're trying to compete with Amazon at this level. Where do you draw the line between IaaS and PaaS? Can you implement your own object store using some open source software just like you run your own HAproxy and database instances?
> Can I setup a Linode through the API with private networking enabled?
Yes, but it's a separate API call. :-) https://www.linode.com/api/linode/linode.ip.addprivate
> I am a happy customer, and I loved your KVM upgrade, I love your prices, and I love your service. But it would be nice to see more innovations happening.
Agreed on all points. My innovation wishlist includes VPC (I really just want my own private VLAN) and more flexible node configurations (another commentor mentioned that you can't add disk without adding CPU and RAM—and vice-versa—and this is a huge pain point). I haven't had to deal (yet) with IP swapping and anycast but I can see those becoming issues for us in the future. And I actually do agree that Linode should implement some kind of a object store to compete with S3... I would categorize that as IaaS, not PaaS, and it's the most glaring hole in Linode's offering right now.
If you are going to end up re-building AWS's services from scratch on top of what are essentially naked linux machines (or doing your integration work between services from many vendors), you are doing a huge amount of work that isn't necessary and will be a lot more expensive than that $20k/ month you could have been paying to Amazon.
Aren't there tools that can provide a similarly high level/seamless/polished experience running on your own computer, using your choice of cloud provider as the backend? If not, why not?
(I wouldn't know - I don't even run a website, and if I did, I would avoid the locked-in services like the plague, out of principle. But I'm curious.)
No, you need many more than that. AWS's offerings are complex systems, each backed by multiple large teams of developers and operations people. And that's not just because they have tons of customers.
You want to run a partitioned, replicated data store with five nines of reliability? Good luck doing even just that with two people. Throw in metrics (CloudWatch), a queuing system (SQS), monitoring/failover with on-demand provisioning (EC2 + auto scaling groups), and you're talking a lot more than just two devops people... well, unless you want them both on-call 24/7/365, and you want them to burn out in a matter of weeks.
AWS isn't perfect by any means, and you can certainly argue that it can be overpriced... but maybe you pick and choose, and use AWS for the things you don't have expertise in or time for, and roll your own for the rest. The nice thing is that AWS lets you offload all that stuff when you're small and don't want to or can't retain that kind of expertise in-house, and then you can focus on actually building your awesome product that runs on AWS's plumbing. And when you have more time and people later, you can look into doing things differently to save money, piece by piece, when it makes sense to do so.
1. The "Bus Factor" - https://en.wikipedia.org/wiki/Bus_factor
2. Context switching - if you have a developer that also works on coding or other things.
So maybe it's not a full $240k/year for only 1 service, but the price in the door IS that. 2 to 3 extra services will cost no more - that's where AWS will make the money.
At scale, the arguments about needing 10 admins vs needing 15, for example, start making sense.
This covers not just running the services, but keeping up to date with security updates, scaling the systems, feature additions, bug fixes and keeping up with the state of the art.
Services like RDS are an amazing net cost saving for small to medium users. Something like Cloudfront would cost you a million just to get going. Raw EC2 is nearer break even.
We've used CDNs in the past, way below a million/year for a medium, growing startup. care to elaborate?
I'm involved in two different startups, one on Linode (because that's all it can afford) and the other on AWS. For it's current offering, Linode is great for MVPs, and you like your servers as pets, but it's doubtful we'd stay with Linode once we start having to scale.
This is fundamentally not understanding what AWS is.
- You could say "EC2 is just Xen". But next time there is a 0-day exploit, I'll have AWS working all weekend to patch my servers. And Xen still doesn't have an API for scaling physical hardware..
- You could say "S3 is just Apache". But I will never see a "disk full" message, I will never get paged if something is borked (but it will still get fixed), I will never worry about DDOS attacks, etc.
> Once you implement it yourself on Linux, the ongoing cost drops.
That's like saying "it's cheaper if you change your own oil". Might be true, but doesn't matter. I'm still taking my car in. Lots of other people do. You might try asking them why.
Outsource if you want. When your bill hits $500k/mo and you realize you're paying for things you can do yourself, your position may change.
Have you been through an AWS outage and been paged? It happens. When you realize your business depends on an opaque organization you may want to diversify.
If you're small it makes sense to outsource sometimes. Not always and not forever.
I'm using them in prod already, and trying to create a similar setup on real hardware for enterprise customers using pg, pgpool2 and haproxy is already a pain, and yet you couldn't autorecover them as conveniently when they go belly up.
We are using MySQL RDS, but are migrating to self-managed Cassandra. We use RabbitMQ instead of SQS. We use Hive/Presto instead of Redshift.
The key point is, running those things yourself, you are likely to lose all the nice failover mechanisms/monitoring/auto-snapshotting stuff that AWS offers. To live up to that, it will require you to not only have extensive understanding of the software you are running, but also a considerable amount of your time will be dedicated to Ops side, which can lead to some really big frustration from time to time. In that sense, I don't think you can re-IMPLEMENT something by assuming too much of comprises.
So my suggestion is for a small-non critical projects you can use Linode, but for anything serious you better use something that is reliable.
I'm not saying you're wrong, just that your analysis is incomplete.
That's more than enough reason for serious customers to avoid Linode...
I just did this migration (non-AWS -> AWS), and we saw a ~20% reduction in monthly prices with 1-year contract at AWS vs. physical hosting (admittedly we were getting terrible pricing with our physical hosts, but still). This will be reduced with the addition of t2.nano servers as we run enough t2.micro instances for little odds and ends that it will put a dent in the bill.
Still, I agree with the parent's general sentiment re price. The gap used to be much worse, but it's still anywhere from 2x-20x depending on what you're doing.
- EC2 prices are simply high and exclude things like bandwidth.
- The CPUs are usually 1-3 generations behind (they've gotten better with this).
- Virtualization adds overhead (again, this can be very significant if you're doing a lot of system calls)
- IO options are significantly worse
- Their network connectivity is, at best, average
You said it yourself, you're comparing a "terrible" deal to AWS and only saved 20%. Sounds to me like AWS is only 20% better than terrible ;)
On your points about AWS itself I'm not going to argue with as they certainly are valid, however in my specific setup (everything production is c4/r3 types, 1tb gp2 volumes as all volumes, and running network benchmarks to ensure everything is properly sized) I've actually had a pretty big performance gain over dedicated servers in a DMZ behind a hardware firewall.
With regards to the bandwidth, it seems a common issue is when you have very chatty services (Apache Kafka) deployed across multiple AZ's the bill gets real big real fast. Thus far, we're not even cracking 3-digits in bandwidth monthly. Maybe we're simply not at a scale to notice these problems yet.
- Can Linode offer short-term compute or memory intensive service? e.g., If I want to consume 100 teraflops 24x7 for about a couple weeks, or maybe 10 teraflops but a terabyte of memory. (Pay-as-you-go, not monthly/yearly subscription)
- On top of that, Can I do above with my IPython notebook code, which is already setup (it uses numpy at the backend so python slowdown is not a problem), with a step-by-step process of how to make it run on Linode (without having to install python scientific stack)?
These are some of the use cases that I think are getting popular these days, and for which AWS is known for, e.g., in the areas of machine learning, 3D rendering, etc.
Their service limits start low and ratchet slowly, if they have the capacity to give.
All these features and elasticity sound nice until you get big and they don't deliver.
I don't know what an AWS TAM is but the SAs have tried and failed.
Make a new account and try to run 1000 c4.4xl in parallel for 1 hour. It won't happen.
That's not what happened the five times, but it's illustrative.
And who defines what is a need and what is a want?
We currently use a mixture of root servers (DB) and VPS (App, HAProxy, NGinx) at a hoster.
* We use a Docker/Consul/Nomad/SaltStack setup to install new servers for replacing broken ones and growth
* App setup is Redis/Disque, Postgres(DB, BI) and app servers
* New dedicated servers we get in around 1h, VPS in around 1min (Only VPS over API)
* We have around 0.2 Ops FTE (devs rollout apps, ops for security updates, new infrastructure etc. and incidents)
* Price seems to be way below Amazon
With all those things that you have in place, AWS would not be a big win for you, though you still might find some use of AWS services like S3 for reliable object storage, Route53 for DNS, or Lambda for "server-free" event handling. The unique thing about AWS is the huge portfolio of services they have and the absolutely amazing rate at which they manage to pump out new (and useful!) ones.
The Salt is of the shelf, with some bash scripts on top to auto generate some config and make installation of a new server one script call.
We surely could put endless hours into Ops, playing around with stuff etc. but the benefit would be marginal.
Our DNS setup is simple, not much to say, we use DNS Made Easy for failover, we use S3 though I do not consider it "AWS" (S3 is comparativly cheap).
I've wondered about Mesosphere with autoscaling etc. but with adding a new VPS in seconds setting up Mesosphere/Kubernetes taking two weeks would take quite some time to amortize.
Cloudwatch, SQS, SNS, Dynamo, S3 etc offer good managed services that you can plug into your applications/systems with not that much effort - and crucially very little OPS needed.
I see your characterization of AWS above as "terrible", but I'm curious how that squares with the (probably very well-informed) decisions these people made to use AWS.
- Know how to maintain a server by yourself
- Hire someone who can
AWS removes part of that and thus people pay for it. An average (good enough) sys admin costs the same as a programmer.
Reply by kawsper to your comment highlights several drawbacks of Linode. Specifically this feedback "Your API is a mess, and so is your documentation. Everything is GET-requests. And creating a new instance, shouldn't take 11 requests" alone, if true, is sufficient for me to not consider Linode for hosting. As being in an ultra small team of 2, we can't afford the luxury of API requests failing so many times.
Having used AWS since 2010, and since 2007 for backups(s3), I can say that your comparison $20,000 in AWS vs $250 is wrong. Have you assumed a very foolish selection by a user? Where in a user will just pick up all their services (e.g. RDS for a DB, etc.)
Just as a fact, we don't use RDS but install MySql by ourselves as it allows us to have other things on that instance, and also comes out cheap in comparison. On this note my contribution to the OP (article) would have been bundle different services on an instance, rather than buying stereotyped instances. Of course YMMV.
The biggest advantage is for ultra small teams, where you have comfort of a stable environment, with respect to instances launching; APIs working; volumes getting attached/unattached; snapshots getting taken; backups done on s3. All taking place automatically, while you rest peacefully.
I am not arguing that apples to apples Linode (or some other service) will not be cheaper than AWS. I am sure it will be. But still many people would like to stick with AWS because of stability and maturity of its cloud.
Also you got to give credit to AWS for pioneering hourly billing, and disrupting the cloud environment. Despite agreeing with you mainly on the cheaper point, when I moved to AWS from dedicated hosting in 2010, my monthly bills reduced.
Lastly. The experimentation. The wide variety of instances it has from micro (to the recently launched nano) to ultra large instances, allow you to experiment a lot. For example: in the past few months: I moved from couple m3.large to couple c4.xlarge, while experimenting in between on c3.xlarge type of instances.
Finally, when you make such a blanket statement, you not only demonstrate your naivete but also undermine the decisions of thousands of satisfied AWS users.
All said, I would like you (Linode) and others to be a good alternative. I am glad you are there. This is to keep AWS on its toes. Which is in my interest as their customer.
PS: Lest my handle makes someone think, I represent AWS. Please be assured I do not. I just created this handle, when I had to ask an AWS specific question (you can check my first post in this handle). And then continued with it for other things.
Edit: minor correction
Really? I didn't get that at all.
What I saw was a post giving negative opinions on various aspects of AWS, and stating that AWS' offerings are available elsewhere. There was an example given, of a previous project where he'd switched away from AWS to dedicated physical servers (at an unnamed provider). There was no suspicious boosterism/puffery in the direction of his own employer, like you generally see when a shill is trying to get away with something.
In fact, it seems like the only real mention he made of linode was in the disclosure that he kindly added, pointing out that he worked there and was therefore biased. Most ads don't bother to do that.
If this counts as an ad, then by those standards nobody could ever post in any thread covering the industry in which they work.
I find there are a lot of high-level abstracted tutorials, but for the new services, there aren't a lot of detailed tutorials.
For instance, an implemented cognito->gateway->lambda->dynamodb is really hard for a newbie to do.
I think there's a big need for a crash course for devs that starts with all the crap they previously ignored or had someone else do for them and I say this as someone who has always written code first and done sysadmin second.
I agree partially. In the end the documentation is what you are going to need to read sooner or later. Or trainings equivalent to that documentation. Good tutorials are good for starting, but doesn't make you a professional.
I guess that in 10 years any one will be able to create websites for billions of internet connected users. But for now, as easy at it is, it is still complicated enough to require an expert. In the same fashion that 20 years ago you needed an expert to make a 3D game and nowadays there is plenty of technologies that allow you to do that with a limited amount of programming knowledge.
I have seen horrible things, usually security related, because non trained people think that they can achieve anything just with standard configurations and quick tutorials. And it looks like they where able to do it, until something really bad happens. Even people with long experience can make mistakes because it is a complex thing.
Yeah that line is blurring, too.
A YC S15 startup, Convox (http://convox.com/), aims to "make AWS as easy as using Heroku." It looks really promising.
This is definitely a goal of Convox: to remove as much AWS complexity as possible.
Our approach matches this guide to a tee. We are using CloudFormation to set up a private app cluster, as well as to create and update (deploy) apps. We are also using ASGs.
The instance utilization point is spot on too. The fist thing convox does to make this easy is a single command to resize your cluster safely (no app downtime).
Coming next is monitoring if ECS and CloudWatch and Slack notifications if we detect over or under utilization.
I strongly believe that these AWS best practices can and should be available for everyone. For anyone starting from scratch or migrating apps off a platform or EC2 Classic onto "modern" AWS.
I'd like to use AWS more, but each time I tried to get into it I felt overwhelmed. I currently use PagodaBox a lot, which is great (most of the time) because it handles a lot of the complexity for me, but it can often be expensive. How does Convox compare to PagodaBox?
Convox has the same goal of a PaaS: to give you and your team an easy way to focus on your code and never worry about your infrastructure.
One big difference with Convox is that we accomplish this with single-tenant AWS things. You and your team's deployment target is an isolated VPC, ECS (EC2 container service), and ELB (load balancers).
If you're asking for a cost comparison, we're building Convox to be extremely cost competitive by unlocking AWS resource costs for everyone.
Its easiest to compare the cost of memory across platforms, though not always apples to apples...
The base Convox recommendation is 3 t2.smalls which is 6 GB of memory which costs about $100 / month. If your app can be sliced up into 512 MB processes, you can easily run 10 processes, which could be 2 to 5 medium traffic PHP apps on the cluster.
I'm finding PagodaBox pricing calculator a bit confusing but 6 512 MB processes, so 3 GB of memory, is $189.
Do you plan to eventually charge a monthly fee for using the command-line tool?
The most straightforward model, and where we are already making some money, is running a Convox as a managed service.
In this setup you and your team get Convox API keys. Convox installs, runs and updates everything for you in our accounts. You get a monthly bill that's your AWS resource costs plus a percentage to Convox for management.
We will be tweaking this model to sell packages so bills are really easy to understand.
Some other experiments we're doing...
We sell support packages and professional services for app setup, migration and custom feature development.
We have a per-seat model for productivity features. Private GitHub repos and Slack integrations are $19 / user / month. There are more closed SaaS tools like this coming.
Infra is trending to commodity prices industry wide.
We'll be selling SLAs, support, productivity tools on top of that infra.
You'll get a cutting edge private platform without hiring and managing your own devops team to build and maintain it.
Open source users will help grow the user base and make the platform better without us running a freemium platform.
The plan is to get the Convox API locked in while mastering advanced AWS like VPC, ECS, ELB, Kinesis and Lambda behind the scenes.
Long term, yes, and in tandem with when the other cloud providers leveling up. For example Google Cloud Logging (for continer logs on GCE) is still in beta.
Disclaimer: I work for Pivotal, who donate the majority of the engineering effort to CF.
CloudFoundry is a really solid platform, but there is a very important distinction between CloudFoundry and Convox.
Convox is a very thin layer on top of "raw" AWS. It gives you a PaaS abstraction but behind the scenes is well configures VPC, ECS, Kinesis, Lambda, KMS, etc.
For those of us with no need to run on multiple clouds, using pure AWS is simpler, cheaper and more reliable than a middleware like CloudFoundry or Deis.
If you want to run a private platform without bringing in operation dependencies like etcd (Deis) or Lattice (CloudFoundry), give Convox a look.
Every Convox cluster is an autoscale group managed cluster of ECS instances.
Every Convox app gets its own Kinesis stream for logs, ELB for load balancing, and S3 buckets for settings, build artifacts and encrypted environment. And the app processes are run via ECS.
So I'm confident we could handle the 7 apps in a single cluster, scale the cluster instance size and count, scale any individual app process type, and handle any individual app load balancing or log throughput.
You can provision some services like RDS with our tooling which make it really easy to link to apps. You can also bring your own services like elasticsearch or pre-existing RDS and set them in an app's environment to use it.
There are a couple monitoring tools built in.
We automatically monitor AWS events like ECS capacity problems and send Slack notifications.
You can also use our tooling to forward all your logs to Papertrail and configure your searching and alerting there.
More CloudWatch Logs and Metrics work is coming in the near future.
some additional questions though:
1. are security groups opaque within convox or are they exposed to developers?
2. when you say monitoring tools are built in and you have tools for logging does this lock me in to the convox log pipeline and monitoring? what i want to use sensu on my instances? do i have to add sensu to every container? if you run, for instance, five containers per vm do i pay the overhead of five seperate sensu instances? same question for something like logstash?
3. you mention vpc. aws has proven stingy with vpc service limit requests in the past. i have trouble getting them to grant more than low double digits per region/account. can i run multiple convox racks per vpc or is the one vpc per rack a hard requirement?
2. Currently every app gets a Kinesis stream and we tail all Docker logs and put them into Kinesis. Then `convox logs` can stream logs from Kinesis, and `convox services add papertrail` adds a Lambda / Kinesis event source mapping to emit the stream as syslog to Papertrail.
I'm pretty happy with this setup and think it represents a good default infrastructure that is still extensible.
Would Kinesis -> Lambda -> Sensu make sense too? It's a pretty new pattern but this seems a lot saner to me than per-container log agents, or even bothering with custom logging drivers.
That said, one user has been using logstash by bringing a custom AMI with his logstash agent and creds baked in.
3. It's one VPC per rack, but I could see modifying that. We've already started to parameterize some VPC settings like the CIDR block to help integrating with your existing VPC usage.
The feedback was that Lattice is not what developers wanted, so it's been wound up in favour of MicroPCF, which is a single VM image that runs an entire, actual Cloud Foundry installation.
When developers decide they want to scale up to any size, they simply retarget a regular AWS/vSphere/OpenStack/Azure CF API server and push again.
I'm sure Convox has a single-VM version I can tinker with on my laptop.
Making those notions and technologies easy and cheap to access AWS suddenly gave devs the idea that they can roll out complex infrastructure on their own, similar to copy-pasting a piece of code. Well, it is still a bit harder than that, and if you are that kind of dev (which I'd applaud), you'd better dedicate some time to learning those technologies.
Me: "I just said I was interested in DevOps and would like to give it a shot, I don't know how long it'll take, I'm working on it."
We're using gateway -> lambda -> dynamodb, and there are a tons of gotchas and small things that AWS need to iron out, especially with gateway -> lambda.
CPU/Memory aren't the only measures of underutilization. If you require high instantaneous bandwidth throughput, then the networking capacity available to your instance roughly increases with the size of your instance. This includes both EBS as well as other Network traffic.
Table with Low/Medium/High: https://aws.amazon.com/ec2/instance-types/#instance-type-mat...
Example benchmark with c3 instances: http://blog.flux7.com/blogs/benchmarks/benchmarking-network-...
If you're more concerned with just EBS network throughput, check out the table on this page instead: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-ec2-...
Be extremely careful when using public customized AMIs, a lot of times ~/.ssh/authorized_hosts contains public keys and this is obviously a huge security problem
It seems like the assembly of the AWS ecosystem.
Does anyone else have a favourite hammer for this particular nail? I'd love to have something better than our home-baked solution, but I'm yet to find anything which doesn't introduce other flaws, such as an incomplete implementation (missing parameters or resource types) or ultimately making a leaky abstraction on top of CloudFormation somehow.
I ended up brewing a reasonably straightforward solution using Python as a (minimal) DSL which emits JSON. Its primary purpose is to support the whole of the CFN ecosystem (not just implement some small part of EC2, for instance) while also not trying to be too clever.
It has about 50-100 lines of python which implements helper functions such as ref(), join() and load_user_data(), and not many other things. There is an almost 1-to-1 correspondence between the generated CFN configuration and the python source. As a bonus it checks for a few common mistakes like broken refs or parameters which aren't used.
I have heard that similar solutions have been reinvented in a few places, including the BBC. But I'm yet to see a good public solution!
There are validation problems even within specific tools: Just look at the RDS setup alone: A ton of options that are often mutually exclusive. It's brutal. And don't get me started with security groups.
At Monsanto, we built our own toolset, and open sourced it. It is all Scala, so it might be a bit of a learning curve for many folks, but there's an actual attempt in there at making sure that if you can write it, have the tool blow up before it gets to AWS, which then realizes something went wrong, and that it has to roll everything back.
Writing reusable code in Terraform is an exercise in frustration due to the extreme clumsiness of HCL (which, I understand, was used because "YAML is complicated"--well, that's true, but YAML isn't a good solution either, you're HashiCorp, you wrote Vagrant, you already know how to do this!). The application architecture is reckless and full of race conditions; your state will be hosed if one resource errors out at the wrong time, while other resources are being successfully updated--the resources that return successfully after the failed resource will on many occasions fail to be persisted to state. What's more, application testing seems to be at best an afterthought: there have been regressions in the providers that will break your existing states.
I would under no circumstances use Terraform if I didn't have clients who had selected it before I was working with them. If in AWS, I would use CloudFormation, with a tool like Cfer (which is excellent, reliable code) or SparkleFramework (which is more full-featured but I hope you never need to debug it) to provision my stuff.
(Full disclosure: I'm building a much, much better provisioner for multi-provider cloud infrastructure. Neither of the projects I recommend are mine; mine's not done yet.)
 - https://github.com/seanedwards/cfer
 - http://www.sparkleformation.io/
It's older than CloudFormation and Terraform (born 2010). It can manage anything that someone's written a driver for. So far that includes AWS, Azure, vSphere/vCloud, OpenStack, VirtualBox, Google Compute Engine, Apache CloudStack and there might be others I missed.
It stores state in a database. It is able to recover from mismatches between the state of the world and the desired state. Cloud Foundry users have been using it for years to deploy and update CF installations. Pivotal Web Services (I work for Pivotal, in a different division) has been upgrading to the most recent CF release every few weeks, live, without much fuss, for years.
For any kind of heavily stateful infrastructure, BOSH is a strong candidate.
Honestly curious, can you point to one or two?
I have a sneaking hunch that the continuing problems with template-file resources (complaining that the "rendered" attribute doesn't exist in dependent resources) are related to this, but can't prove that; my clients don't pay me to debug Terraform, but to get their stuff working, and that doesn't leave much time to get in-depth with it now that I've decided not to use it for my own purposes anymore.
One of the convenient things about software that doesn't exist is that it doesn't have any bugs.
Let your software speak for itself when it exists; until then, this seems an undeserved critique of software, and a team, that is solving problems every day.
HCL has improved dramatically, and now that template strings are a thing, most of my variable interpolation issues are solved. However you still can't specify lists as input variables so you frequently have to resort to joining and splitting strings. It's hackish and worse, changing one value in the list will invalidate all other resources that use the variable.
Race conditions and dependency cycles are still a problem. Particularly with auto scaling groups and launch configurations -- I have to migrate them in two steps (create then destroy) to avoid a conflict. Same with EBS volumes, I ended up scripting my instance to attach the volume by itself, otherwise there's ordering issues when destroying and replacing.
There's also missing features, such as the ability to create elasticache redis clusters and cloudformation resources.
I'm still glad that I went with Terraform though. It takes a good amount of time to get around the limitations and bugs, which can be really frustrating, but when it works, it works beautifully.
Terraform is relatively new and improving rapidly. It has its problems, but it's light-years beyond CloudFormation. It's clear that Amazon doesn't place a high priority on making CloudFormation easy to use, or to support new features. The right approach to any problems with Terraform is not to spread FUD about it like below, but to contribute code fixes.
Oh, HCL is fine, you say so authoritatively? Well then do me a solid and show me an if statement, show me a for loop. Because you're not building nontrivial, reusable infrastructural modules without logic. I know. I've tried. I've committed, between different projects and clients, somewhere around ten thousand lines of Terraform and probably half are copy-paste garbage because HCL is so crippled a tool.
It hurts me to say this at a deep and visceral level: Terraform's interpolation syntax makes freaking Ansible and its "no, really, it's totally cool, string templates for logic are awesome" look good.
> The right approach to any problems with Terraform is not to spread FUD about it like below, but to contribute code fixes.
Spread FUD? Oh, no no no, you can take your assertions of FUD and insert them somewhere uncomfortable, thank you very much. I wrote Terraframe specifically to contribute back to the Terraform community, to make it better, and stopped (to create a different project) because I was stymied. By no documentation, by HCL <-> JSON not actually working, and by no interest from the developers in any sort of dialogue about actually fulfilling the promises they themselves assert for their software. Between this and bugs that a trivial testing framework should catch (Why are you validating AWS resource names differently between point releases? Why are you changing that validation to be wrong? Why are you breaking my existing states when you've done this? Why did your tests not catch this before you pushed this out to your entire userbase?) I cannot take the project seriously as a tool for being used in infrastructure I care about. Because I don't trust them to take Terraform seriously, either.
 - https://github.com/eropple/terraframe
There's still a lot to do to make it ideal for public consumption (like writing docs and freezing the API), but it'll get there sometime soon. PRs are most welcome.
What I love most about Terraform is that we can include the output of terraform plan in pull requests that make infrastructure changes. Then our continuous deployment process runs plan again and requires an identical output before running apply. This both makes it easier for team members to review changes but also ensures that we don't accidentally destroy infrastructure, which is really easy to do with a lot of these infrastructure-as-code tools.
The other thing that Terraform has going for it over CloudFormation is for hybrid cloud deployments, since it can provision infrastructure in vSphere and OpenStack as well as AWS.
Stuff we've thought of but haven't gotten around to yet:
- Build relatively simple tooling around terraform and Consul to acquire a lock before running apply...we haven't gone to that length yet since only our continuous deployment environment has credentials to mutate production and it runs builds of the infrastructure project sequentially.
- Watching the Consul key where the tfstate is stored for changes to kick off sanity checks to ensure that everything is still healthy.
They're both so flexible that there's probably other ways in which they'd work well together that we haven't thought of yet.
Also https://github.com/russellballestrini/botoform looks to be a newer solution in this space
Terraform is another option and then there's the model we're actively moving towards at work: using Ansible to abstract and completely replace calls to CloudFormation with a combination of existing and bespoke modules to dynamically spin up the infrastructure we need.
We are using very advanced CloudFormation in the open source Convox platform.
I have touched every corner of CF including lots of Custom Resources.
Right now we are using the golang template tools and tests to generate our templates.
But I have lots of needs and ideas for improving this. A CF template compiler and simulator should be possible, giving us all tons of confidence in making template changes and therefore any infrastructure update.
I have some sketches that I haven't published yet.
And I strongly believe CF is the best tool in this space if you're all in on AWS. Let Amazon be responsible for operating a transactional infrastructure mutation service. It's ridiculously hard to do this right.
If you want to brainstorm some ideas send me a message :)
Also, you might want to update your HN profile with contact info.
Terraform is awesome if you want an OSS project to manage multiple cloud vendors.
But I think that infrastructure change management is a really hard problem and the state of the art solution is how AWS runs CloudFormation as a managed service.
Once it's set up properly, it's amazing watching what CloudFormation can do. It can execute updating 20 instances to roll out a new AMI, and then roll the whole operation back on demand or if a failure happens. All with no application downtime in the cluster!
This obviously doesn't necessarily handle teardown very well, and it tends to be copying boilerplate and modifying it, but I find it the most straightforward thing, and simple, if a little verbose.
Have fun migrating data and re-indexing constantly!
It sounds like you have a lot of experience with Dynamo. All the use-cases I seem to keep coming up with are more for storing global environment keys outside of the environment itself and using a large number of IAM roles to access individual keys, and as a kind of throwaway 'I need to store this somewhere, but it doesn't really fit in the main DB, and I still need it to persist for at least awhile' case.
An example: if you provision X capacity units, you're actually provisioning X/N capacity units per partition. AWS is mostly transparent (via documentation) about when a table splits into a new partition (I say "mostly" because I was told by AWS that the numbers in the docs aren't quite right), but you'll have no idea how the keys are hashed between partitions, and you won't know if you have a hot partition that's getting slammed because your keys aren't well-distributed. Well, no, I take that back -- you will know, because you'll get throttled even though you're not consuming anywhere near the capacity you've provisioned. You just won't know how to fix it without a lot of trial and error (which isn't great in a production system). If your r/w usage isn't consistent, you'll either have periods of throttling, or you'll have to over-provision and waste money. There's no auto-scaling like you can do with EC2.
Not trying to knock the product: still using DDB... but getting it to a point where I felt reasonably confident about it took way longer than managing my own data store... and then they had a 6-hour outage one early, early Sunday morning a couple months ago. Possibly solution: dual-writes to two AWS regions at once and the ability to auto-pivot reads to the backup region. Which of course doubles provisioning costs.
Ok, maybe I'm knocking it a little. It's a good product, but there are definitely tradeoffs.
Agreed that DDB documentation could be better on the best practices. I found this deep dive youtube video very helpful: https://www.youtube.com/watch?v=VuKu23oZp9Q
That's actually another instance of the lack of transparency and trial-and-error: "oh hey, writes are getting throttled... no idea why... let's drop this index and see if it helps".
And in the case of both the redis and memcached backends, if the maintenance requires restarting redis/memcached or rebooting the instance, you lose all data in the cache (at least up until your last backup). For this particular project, that amount of downtime would easily cause a real outage for customers, and was unacceptable.
DynamoDB db = new DynamoDB(new AmazonDynamoDBClient());
Table myTable = db.getTable("table");
Item myItem = myTable.getItem("id", id);
That said, since you should be using an infrastructure provisioning tool like CloudFormation, the tagging solution should not be a particularly big obstacle.
Does anybody have experience with running a VoIP (e. g. Asterisk) on AWS?
But even with smaller instances, AWS network performance is going to be about the same as any VPS provider. The only way to get guaranteed performance is to do Colo, which is expensive.
At first, I used instances that were not very performant, causing some problems. I quickly moved to more powerful instances and since then there have been no problems at all. There is a slight delay, which is normal, since the servers are not in the same country as the users anymore, but the users haven't noticed it at all.
I can't say how well it scales, though. I have a small user-base, so scaling has not been a problem yet. Network performance might become a problem if you have a huge user-base.
If you want something simpler, you have Heroku and a bunch of similar things which make a bunch of decisions for you -- but you don't have the flexibility there that you do with AWS of course.
That's exactly right, avoid manual infrastructure.
Someday we will all have something like Rails for infrastructure. Strong conventions around best practices.
If you follow these conventions you can avoid bespoke or manual configurations and focus solely on your app logic.
We're building Convox to advance this goal.
1. Use CloudFormation only for infrastructure that largely doesn't change. Like VPC's, subnets/ internet gateways etc. Do not use it for your instances / databases etc, I can't recommend that enough, you'll get into a place where updating them is risky. We have a regional migration (like database migrations) that runs in each region we deploy to that sets up ASG, RDS etc. It allows us control over how things change. If we need to change a launch conf etc.
2. Use auto-scaling groups in your stateless front ends that don't have really bursty loads, it isn't responsive enough for really sharp spikes (though not much is). Otherwise do your own cluster management if you can (though you should probably default to autoscaling if you can't make a strong case not to use it).
3. Use different accounts for dev / qa / prod etc. Not just different regions. Force yourself to put in the correct automation to bootstrap yourself into a new account / region (we run in 5 regions in prod, and 3 in qa, and having automation is a lifesaver).
4. Don't use ip addresses for things if you can help it, just create a private hosted zone in Route53 and map it that way.
5. Use instance roles, and in dev force devs to put their credentials in a place where they get picked up by the provider chain, don't get into a place where you are copying creds everywhere, assume they'll get picked up from the environment.
6. Don't use DynamoDB (or any non-relational store) until oyu have to (even though it is great), RDS is a great service and you should stick with it as long as you can (you can make it scale a long way with the correct architecture and bumping instance sizes is easy). IMO a relational store is more flexible than others since you (at least with postgres) get transactional guarantees on DDL operations, so it makes it easier to build in correct migration logic.
6. If you are using cloudformation, use troposphere: https://github.com/cloudtools/troposphere
7. Understand what instances need internet access and which ones don't, so you can either give them public ips, or put in a NAT. Sometimes security teams get grumpy (for good reason) when you open up machines that don't need to be to the internet, even if its just outbound.
8. Set up ELB logging, and pay attention to CloudTrail.
9. We use Cloudwatch Logs, it has its warts (and its a bit expensive), but it's better than a lot of the infrastructure you see out there (we don't generally index our logs, we just need them to be able to be viewed in a browser and exported for grep). It's also easy to get started with, just make sure your date formats are correct.
10. By default, stripe yourself across AZs if possible (and its almost always possible). Don't leave it for later, take the pain up front, you'll be happy about it later.
11. Don't try and be multi-region if you can at first, just replicate your infrastructure into different regions (other than users / accounts etc.). People get hung up on being able to flip back and forth between regions, and its usually not necessary.
edit: Track everything in cloudwatch, everything.
We’d built a tool called Liquid Sky (https://liquidsky.singtel-labs.com) to help us keep track of the cost impact of the changes we make constantly. I did mention that we use cloud for reasons beyond cost, but we definitely still want to know that we’re sensible and maximise cost efficiency as well, its just another (important) factor. Because we change our cloud resources so frequently, we didn’t want to make it a very rigid process when dealing with the sensibilities of cloud cost. Hence, we’d built Liquid Sky in a way that gives our engineers the freedom to explore better way of running things on the cloud while keeping cost in check as well as keeping the team (including cost guardians) in the loop.
About comparison with other services, I was working in Deloitte Analytics before, managing the cloud services provided by IBM Softlayer. You cannot compare them, AWS offers many more and I was not really satisfied with SoftLayer, for example I had a problem with a network upgrade they did on January 2014 and I have lost a lot of data, with poor support to restore it. Also the starting price of 25$ per month is really expensive. AWS is far more mature and interesting.
Then for my own servers I use CloudAtCost cause is cheaper but if I run a business for sure I would go with AWS. If you gain money, is not that expensive and if you stick with Amazon advices and philosophy is very reliable.
6. Not starting a box/instance/database and forgetting it's running until you receive the bill after your free tier expires.
6. Not setting up billing alarms, http://docs.aws.amazon.com/awsaccountbilling/latest/aboutv2/...
I'm also not sure how to make the jump from exporting AWS_ACCESS_KEY_ID and having my instances automatically request the permissions they need - STS?
Check out instance profiles. This feature allows any AWS API-aware application to request credentials on demand, eliminating key management/rotation:
1) Not giving out your access and secret keys in scripts/buckets.
2) Always using IAM roles with your EC2