I'm unsure what use case was, but if it was a testing or staging environment dedicated to each developer, having a separate right-sized instance for each is quite sane and reasonable.
EC2 instances such as the T2/T3 series are cheap enough that there's normally not a need to be so miserly in my experience.
> Did you weigh these costs against the value of the developers not stepping on each other's toes using the same instance? Separate users with sudo privileges provides zero isolation.
Each developer's machine is a separate instance. If the whole system could run on "one dedicated blade," then the developers could easily run a local instance on their own machines (either directly or in a VM) without stepping on anyone's toes.
Using AWS for local development seems wasteful and over-complicated to me. The only good reason I can see for doing it is if you're really vendor-locked into AWS-proprietary services, and it's impractical to run your code anywhere except on AWS (but that's arguably a mistake in itself).
What the parent you're responding to likely meant was that, by running everything on one system ("one dedicated blade") you are causing all of the things on that system to share certain things. The set of system-installed packages; if nginx needs to be installed and running, is its configuration now shared between all of the services on that machine; and even the very OS itself (e.g., if it needs to go down for updates, or it needs to do, say a dist-upgrade, which means that all of the associates services on that instance need to dist-upgrade together, and one laggard can thus hold everyone back.)
Even in the case of developers using an instance on AWS, the same applies, really: a dev, legitimately developing some component, could have cross-talk to others, in all of the same ways listed above. I do agree that giving devs dedicated instances might not be appropriate in many instances, and it perhaps wasn't appropriate in yours. But there's two points here: whether the devs needed those instances, or could develop on local laptops, and whether putting all instances or users on a single instance works well.
Even for development, given that most corps seem to love giving devs that code primarily for Linux servers OS X laptops, there might be advantages to allowing them to develop on the actual target hardware.
> There was just this culture (made easy with docker containers) of "spin up another box" for every silly little tool.
Docker here should make it rather trivial to combine everything onto a single instance, since that's what isolation gets you: a well defined unit that doesn't have endless unknown dependencies on the host system.
I really don't see why many run their environment on macOS through Homebrew etc which is surely going to require modifications here and there when deploying to Linux servers.
Of course ports can't be shared for each app servers but you can get around that anyhow.
This is being sold as a "step in the right direction", where "right direction" is anything that reduces cloud costs. (Payroll costs are apparently invisible, so the time people waste dealing with this baloney doesn't count. Not to mention the annoyance.)
I'm tempted to point out that they could save a lot more money by merging microservices into macroservices....
Wait, you're saying there's a Bachelor of Science DevOps degree?
I interviewed at (a fast growing real estate startup) and as an old guy, I highlighted my vast enterprise Linux engineering experience. The response from management was, oh, that's nice but doesn't matter here. I can teach people the command line in one day. This is a big AWS and exclusively Linux shop.
What do you say to that?
Grey-haired Linux folks who've hammered on complex, changing, sometimes esoteric systems in different companies with unique challenges COMBINED with having to have learned a programming language, to view these people as obsolete is an ultimate folly of youth here in IT.
To suggest that it's all different now, your Shell/Perl/Webserver/database/caching/networking experience somehow has nothing on Python/Terraform/AWS/Docker really shows inexperience of hiring managers.
I learned both, and I can tell you AWS+someyaml+Python are MUCH easier than any of the stuff I listed above. Yet, there's a new generation of hiring managers that believes otherwise.
Today at work I had to explain absolutely basic DNS to someone. I dunno, call me skeptical about this new paradigm business. The paradigms look awfully familiar.
Both with tech and non-tech related learning, my autodidact process (largely determined by growing up through the late 90's and early aughts) was the following:
(1) Google search or Amazon search
(2) Blitz-read some website I thought might be relevant from the Google search results or Blitz-read for author and subject matter info. I thought might be relevant from the Amazon search results
(3) Stop at (2) because I had pieced together a solution or had gleaned a sufficient amount of information; or repeat (1) and (2) until piecing together a solution or gleaning...
This rather clunky but speedy (notice I did not say efficient) process likely accounts for far more of my knowledge and skill base then I'd care to admit to friends and colleagues.
All this to say: management's response to you is representative of a default mode of thinking determined (to a non-trivial degree) by the environment in which they grew up.
This default mode of thinking has a number of short-term benefits, but often results in many long-term headaches. Indeed, you really can teach someone the command line in a day. However, this someone will end up doing stuff I do with the command line, such as using ls 100x more than is necessary or almost always changing directories one at a time.
EDIT: I agree that OPs choice of words makes the idea expressed ambiguous.
All DevOps / SysAdmins / etc. are "self-taught." So it's unclear what the OP meant.
I consult with multiple customers and they all want cloud. It's "Cloud First" and any discussion to the contrary results in lost business. If you do that you are considered a Dinosaur.
On the issue of predictable load, most businesses not only have predictable load but a lot of capacity is just allocated and never used. VMware serves such scenarios even better by going for heavy over allocation. Everyone will demand 10 servers with 8vCPU, 64 GB of RAM without actually using even 10% of the allocated capacity.
But hey Cloud is cool so I have to be on it.
That’s like the old school Netops people who got one AWS certificate, did a lift and shift of a bunch of VMs and then proclaimed their client has “moved to the cloud”. The client still has the same amount of staff, same processes, etc. Of course it isn’t going to save any money.
- Your Colo has direct connect to AWS, so you can deploy EC2 under times of load
- Your Colo offers some sort of virtualized infrastructure
that you could wire into your network and offload under times of load
- Your Colo has a virtual server provider running that you can offload to in times of load.
- Is close enough to one of the big 3 that a Site to Site will work fine
If you can predict workload then you could also reduce your cloud bills, in theory, by moving spinning up/down instances.
Many are going the cloud route because the cost of entry is much easier to start developing but eventually, yes, the cost skyrockets and you do end up paying for putting little forethought into the real workload requirements.
One senior sysadmin/infrastructure engineer with on-prem experience is enough to scale up your colo presence. You can either assemble your own servers or custom order and have them shipped directly to the colo. Remote console and power management (PDUs) is a solved problem. Out of band management, also solved. Virtualization, solved. Push your VM snapshots, content, artifacts, and database backups into your object store of choice for offsite backups. Ideally, your colo facility is within driving distance of your office if you absolutely need physical access. Otherwise, you pay someone at the colo a few hundred bucks a month for remote hands services.
EDIT: k8s arguably makes this easier than in the past. You want to run on your own cluster? You can. You need to burst your containers to spot instances in someone else's cloud? You can.
With physical hardware, the sysadmin will spend a lot of his time fighting the hardware supplier and the colo to get hardware in place and running, then reinvent the tooling. Any trip to the facility is half a day lost.
Have you never lost half a day of getting work done because Amazon's EC2 API was failing to launch new instances? I have! 
To simply hand wave away hosting your own hardware as infeasible and more costly (especially when it's been proven to be more costly) is unproductive at best.
 https://hn.algolia.com/?query=AWS%20outage&sort=byPopularity... (HN search of "AWS Outage")
Self hosting is doable up to 2 or 3 servers with a very few services. Beyond that it's really suffering from the lack of tooling, the lack of isolation and the lead time of weeks to get anything more in place.
DELL wants to re verify by phone 3 times before they bill or ship anything. Don't ask me why, I don't know, I keep begging them to stop doing that.
I've had the API down about 4 times in a year, always for specific types of instance, if I remember the count well. Never more than a few hours. It's nothing in comparison to the weeks it takes to ship and setup physical hardware.
About the most complicated tooling we had was Anaconda.
I think I lost three hard drives in that time and all three were hotswapped with zero downtime under warranty. No other failures. And that's not an outlier - I've been doing this 25+ years and the hardware failure rate has been pretty consistently low.
The key phrase in your post is 'doing this 25+ years'.
What you're providing is the /confidence/ that you can do it. Most outfits nowadays probably have no one who has that.
At the ultra-small scale (1 person company), it's arguably cheaper to run from a single server you bought.
Mid-sized companies (~2-1000 people) probably benefit the most from cloud computing because they need fewer dedicated salaried employees to manage them.
Beyond ~1000 employees however, you're going to need that expertise regardless of whether you use cloud or on-prem. And the markup of those thousands of cloud machines will start to become significant. You'll need to do a cost/benefit analysis and I can see it going in either direction.
Rather, the cost savings come from the salaries of you and the "one other guy" you mentioned. If cloud hardware can bring your staff of 2 down to 1, then the cost savings are huge. More-so if they can bring it down to 0 and have devs manage the hosted infrastructure. Each of your salaries is likely near or higher than the cost of running all those machines.
Developers and Administrators have different priorities and I can count on one hand the number of true "DevOps" people I know - the rest are either decent admins and poor devs or decent devs but poor admins. The idea is nice but the reality isn't quite what its made out to be.
Physical instances have too much over provisioning and zero flexibility. You can easily be paying for an order of magnitude above the actually used capacity. (This can be helped with VmWare virtualization instead of bare metal but the amount of new companies using VmWare is shrinking and licenses are expensive.)
Whereas with the cloud, you can half any instance CPU/memory to save half the money. There is even build in monitoring to analyze usage. There is little waste.
The cost of the hardware is not the issue.
Paying for 600 machines when you actually need 300 probably "wastes" $50-80k/year in over-provisioned hardware. Hiring a skilled admin to figure out what's going on, optimize the network, and reduce those 600 machines down to 300 probably costs at least $150k/yr in their salary+benefits. The cloud does not decrease the cost of the hardware (it's actually more expensive), but it allows you to reduce the number of admins managing it. That's the cost savings.
Given the above example, what would any reasonable general manager do: hire someone to make things more efficient, or just pay for the over-provisioning?
600 servers at $10k each, that's $6M upfront. Then another $6M within 3-5 years to renew them.
Of course in that example you'd pay someone, even multiple people in fact, to make things more efficient.
I couldn't imagine having to manage 300 physical servers in a team of two. The hardware, OS and networking alone are more than a full time job with 24/7 oncall expectations. In fact that's why I don't do this anymore.
Pretty sure my current company had more failed hard drives this month than you all these years. I remember a shipment of servers once with more dead motherboards than that.
The point is that managing on-prem equipment is the easiest its ever been. When I started the ratio of admin to box was 1:50, now it's easily 1:300.
There are plenty of companies with their own datacenters of hundreds or even thousands of machines. Yes, these companies usually have deep pockets and their own dedicated staff to take care of these servers. This is how it was done before AWS existed, and is likely to continue long after AWS ceases to exist.
You need to get a Dell sales rep you deal with everytime you order. You'll get better pricing too.
I don't mention AWS in my comment, not sure why you thought I was comparing the two.
That remote hands service is usually very limited. Datacenter staff usually don't have the time or the skills to do detailed troubleshooting, setup, or maintenance. Maybe they'll swap out a disk for you, power cycle a server, or turn a knob on a KVM switch. If you want more, you'll have probably have to buy in to managed service, which they'll do on their own hardware and with their own setup, where you'll be paying closer to AWS prices.
Now if you want the kind of availability and redundancy that AWS offers you, you'll have to have a presence in multiple datacenters around the world, where these sorts of issues and others will multiply.
Wikimedia is serving Wikipedia (ranked #5 for Internet traffic) with ~550 servers out of Dallas, TX and Ashburn, VA for about $4 million/year in tech costs. OpenStreetMap infra is ~$120k-$150k/year (albeit with volunteers and some donated hosting capacity). Do you need Amazon to do better?
I am not trying to be a "stick in the mud". I am trying to save you money.
“A few cents an hour”. It’s a 30% markup over physical hardware, not to mention extortionate bandwidth pricing.
I played around with that, and than AWS/DO came along.
By the way, Vmware is not exactly cheap either. I recall when I was selling VMWare licenses 10 years ago, it was starting around $5000 per server.
For instance how you are charged for S3 traffic your buckets receive even if you don't have any files there. Or how traffic between zones that is outside of your control is still charged. Or how things like AZ replication cost a ton and you have no metric to show if it even makes sense to enable them. Heck, even usage alerts cost money.
And that person isn't alone. If you look around, you'll find a lot of services on the internet which directly link to S3 buckets using thee AWS-provided endpoints.
Went with a small cluster of cheapo ARM64 scaleway servers (no bandwidth billing), which costs us ~15 Euro a month.
Much better than the likely ~$US1.5k with S3 just for transfer costs alone.
Can you say a bit more about these two things? When do you get charged for s3 traffic on a bucket with no files? What do you define as uncontrollable cross zone traffic here?
Can only imagine the insanity that goes on at huge Pinterest level scale. How many EC2 instances do they have? Thousands? Hundreds of thousands?
I've spent a lot of time helping web developers, that don't have a ton of infrastructure experience, that get into AWS and they almost always do it the wrong way (the most expensive way). As in, they take whatever they were running on-premise and just throw it on EC2.
Or their IoT talk? https://www.youtube.com/watch?v=v7oqSTmrfVc
I've seen so many people just create an EC2 instance for serving static content.
If you have a banging new React app that generates static files, you don't need anything more.
Once worked at a place that needed to do rapid crawls of websites on short notice and we saved a lot of money by moving from.dedocatrd instances to lamdas that we could scale out and down as needed.
How you use them is important.
..just like it's easier to get I to debt with a credit card than a personal loan. Credit card debt mounts one purchase at a time. Personal loans require you to decide to borrow €X.
The same costs that are being used to justify the value of the cloud are now being used to justify going back to buying servers. Which is the wrong solution, by the way.
With anything it just comes down to education and best practices. People need to utilize the incredible tools out there along with understanding better how the cost structures work.
Eg. 600 TB data out and 100 TB in is $37300 on AWS, and by going with colocation a price of < $500 is very achieveable.
That's ~1/75th of the cost!!!
There is nothing to understand here, except that AWS in insanely expensive and colocation is a way better option in this situation.
Sometimes you don't need that.
AWS S3 provides reduced-redundancy options for less cost. It'd be nice if the same applied to other services.
You're not gonna get a dedicated 10 Gbps link for a few hundreds bucks with a free server.
Let's say you spread that across 10 dedicated instances ("optimized droplets") on DigitalOcean, at a cost of $800. They account pool the droplet transfer, so that's 50tb with the instances. The remaining 550tb would cost you $5500.
$6,300 vs $37,300 and you get ten servers with DO.
And if you really wanted to be creative, spin up 100 $20 instances, which would cost you $2,000 and give you 400tb of pooled transfer. Then spin up the 10 $80 optimized droplets to use for actual work (maybe there would also be some use case for the 100 standard droplets with 2vCPUs and 4gb ram; otherwise idle them). That gets you down to $4,300, almost a tenth of what AWS is charging just for the transfer.
When it comes to transfer, AWS is a minimum of 5x more expensive than it should be even if you assume a premium for the service. It's a big fat money maker for them.
That being said. I wouldn't try to push the limits of Digital Ocean too much. I have an ex coworker who looked into moving a compute grid to Digital Ocean, pinning a number of instances to full CPU, he quickly got contacted by support to stop before they shut down the account.
Clearly, Digital Ocean didn't want to serve that use case. They don't want customers to use all the resources that they are paying for. They're probably under provisioning power and cooling by a large factor.
Otherwise DO gets sued for unfairly threatening.
I've personally known Corey for 10+ years and he's a good dude, so if anyone is looking to lower their AWS bills, talk to Corey.
these companies are a dime a dozen, no personal offense intended to yourself or this company.
I know him from back when I was helping maintain the Saltstack python config management tool. He was a user and we were doing training on contributing code to salt. It turned out that I did teach him a thing or two and he did several contributions to salt. That's literally it.
https://threadreaderapp.com/thread/1091041507342086144.html (original Twitter link: https://twitter.com/QuinnyPig/status/1091041507342086144)
That's all it has taken to save a lot of money.
i run a large AWS consolidated billing family for a sizeable computer science graduate research lab and tracking our AWS spends are an absolute nightmare.
if you think about it for a minute, why would it be in the best interests of amazon/AWS to allow us granular, easy to parse and (most importantly) easy to customize and use billing reports? they're in the business of making money, and providing mechanisms for folks to limit their spends will limit profits.
another thing to consider is that the billing subsystems for all of these cloud providers are literally one of the first things to be engineered, and after release, one of the last to ever be updated.
for instance, it took amazon two years to release OUs (organizational units): one year in closed beta, one in open. while these are great for organizing accounts, you still can't see how much an OU spent w/o a bunch of python/boto gymnastics!
i was on a call w/some of the leads of the aws billing subsystem about a year back, and asked about what the roadmap for billing features and the response was "2020 at the earliest".
Far better than any internal tool I had the chance to work with.
since the aws billing subsystem can't map OU->$$$, i wrote a thing: https://github.com/ucbrise/aws-audit
it'd be also super awesome to have OU-based budgets.
we use teevity ice when we really need to dig deep (https://github.com/Teevity/ice) but it's pretty much abandonware at this point.
i've also explored a couple of commercial offerings (ronin & teevity) but they don't work well in our AWS research credit-based model.
In commercial tools, I would have a look at https://www.cloudhealthtech.com/ and https://cloudcheckr.com/ and there was a third I can't remember the name.
They should have good reporting since that's the only point of the tool, but they charge an arm and a leg.
with consolidated billing, and a couple of hundred linked accounts (with plenty of turnover), enforcing a sane tagging is pretty much out of the question.
thanks for the links tho! when i have a spare few minutes i'll definitely take a closer look.
Enforce some tags on purpose, team, business unit, dev or prod. It shouldn't be allowed or possible to create instances without any information. Enforce the rules, if an instance has no information it will be shutdown next week and deleted next month. People will take tagging seriously very soon.
I've seen businesses who adopted AWS without any strategy and with revolving contractors as employees. A few months later, it's a hundred instances without a single name or tag, each one costing thousands of dollars a month for the more expensive.
If the company cares about costs and resources management, it's easy to achieve. If it doesn't care, then doesn't matter, not your money.
Besides giving businesses more legibility into what specific parts of their business logic cost to operate vs. the value they generate, you can start building higher-order financial systems based on flows of capital+information within businesses. From there you can implement all sorts of financial engineering like insurance and options+derivatives that could allow businesses to do things like dial up leverage against these flows. Certainly half-baked ideas, but fun to think about the possibilities.
One is that not everything will fit. Lots and lots and lots of workloads exist in their current form and are not going to switch overnight. Whatever bookkeeping method that gets developed needs to deal with the mix of costing models.
Another is that you still need to assign costs. In a lot of companies this is the subject of intense corporate politics. Whether such a method is adopted will depend on who is wielding what cudgel.
Related to which: there will be cases of premature optimisation on the visible. Optimising for cost is fine, but costs often include estimates that can be overlooked. It's one thing to optimise for "least dollars spent per invocation". Another thing to optimise for "fewest pissed off customers". The latter is harder to measure but in many cases more important.
But overall? Yes. I think it could be a step forward.
The other part, though, is that dev teams in large companies often aren't held accountable for on-prem costs.
The on-prem costs are often in a big baseline number with little visibility into how much of that big number belongs to each team.
As margin continues to expand, the need for alternative models and competition in 'Cloud' is becoming increasingly apparent.
Throughout the processes the word "cloud" was brought up. Who's cloud would we use? Over and over again I tried to explain we were actually building a "cloud" service. And by that I mean we are just offering a service that runs on our servers.
Due to the feature set and cost we ended up settling on bare metal servers from SoftLayer (IBM). The entire solutions has been made to run on commodity hardware.
For the longest time the company kept marking that we were using "IMBs Cloud".
Every now and then I get a request to ask what it would take to move to AWS. My response is always the same. More operating budget, and less features.
We spin up DR environments for many servers2 in seconds -- and can do so because we have full control over the hypervisor, so moving to anything like AWS, Azure, or what not means we give up the ability for near instance restore. Owning the entire stack has its own set of problems, but in the long run we should be able to move much faster.
"The Cloud" -> "The Internet"
- If you want the AWS dashboard metrics to have 1 minute instead of 5 minute granularity, that's $2.10 per instance per month. 5 minutes is pretty useless, so either you set up other monitoring or you pay a tax on the number of instances you have.
- If you use AMIs (which you probably should) to launch EC2 instances with all your software baked in already, you will probably end up with dozens or hundreds of old, unused AMIs. Furthermore each of these AMIs is linked to a snapshot, which is stored on S3. S3 pricing is very cheap but it's a significant amount of work to determine which AMIs are no longer in use and to delete both the AMI and the corresponding snapshot. Every 100 AMIs you have at the standard 8GB root volume size costs you $18/month.
We previously had this issue. What we do now is instead of leaving old ones on S3, we download and archive them. We just tag them on the dashboard and a nightly script does the rest.
In general tagging is an extremely powerful way to organize your AMIs. This article has good examples of tagging you can follow. We use something akin to the "Business Tag" strategy, with an Excel document. Definitely requires some internal organisation but the cost savings speak for themselves.
If you're going for AWS, you should consider rewriting a lot of your services to use AWS features well.
Auto Scaling Groups and Fargate EC2 are some common components I see few companies using.
If your instances are the same size through the day, you are doing it wrong!*
*Exception: If you provide the same level of service and traffic 24/7.
Anecdata is terrible, but my experience running 1 Fargate container 24x7 on the lowest specs (0.25 CPU and 512 MB RAM) to handle baseline was was going to cost as much as a no-contract T3.micro EC2 instance with much more capacity (2 CPU and 1 GB RAM). AWS Kubernetes was also a bust at $120/mo just to get started (that's the cost before an EC2 server is provisioned).
EC2 is just EC2 with scaling groups while ECS is a fully non-managed solution that's for small items.
I love running stuff in the cloud, but without a conscious decision in an organization to prepare for, and avoid, the various pitfalls, its just a giant cost sink.
Figuring out how/if it's used is really convoluted. And getting out early is a pain.
> You can choose to pay for your Reserved Instance upfront, partially upfront, or monthly
Long story short, only play 1 year full upfront.
I'm working on a Kubernetes configuration at home that provides the common & popular features AWS in a consistent & portable manner: blob store, database, etc.
I've moved off all (well like 2) my personal sites to other VPS provider, that starts at $5/month for 512MB VPS.
The worst annoyance is 'Reserved instances'. You are basically signing up for a long term contract that you can't get out of easily.
Netflix run a spot market for instances to drive down costs for exactly the reasons mentioned here
As AWS Use Soars, Companies Surprised by Soaring Bills
In this case, the history of the users commenting is enough to make it overwhelmingly unlikely that they were paid, and the well-known propensity of users to comment just on a title clinches it.
Still not a great comment I admit, but with the amount of discussion going on I would be surprised if everyone is subscriber or if they are going off of the headline alone.