(1) your business scales at a different rate than you planned -- either faster or slower are problems!
(2) you have traffic spikes, so you to over-provision. There is then a tradeoff doing it yourself: do you pay for infrastructure you barely ever use, or do you have reliability problems at peak traffic?
(3) your business plans shift or pivot
A big chunk of the Amazon price should be considered as providing flexibility in the future. It isn't fair to compare prices backwards-looking: where you know what were your actual needs and can compare what it would have cost to meet those by AWS vs in house.
The valid comparison is forward looking: what will it cost to meet needs over an uncertain variety of scenarios by AWS compared to in-house.
The corallary of this is, for a well-established business with predictable needs, going in-house will probably be cheaper. But for a growing or changing or inherently unpredictable business, the flexibility AWS sells makes more sense!
Your comment makes it sound like unless you know your growth pattern, can predict your spikes, don’t plan to pivot, and you know these things perfectly, you will lose. That’s not the case. The reason for that is that you have a quite significant cost difference between AWS and DIY. DIY is cheaper by a significant enough margin that you might be able to buy, say, double the capacity you need and still spend less. So if you made a mistake of up to 2x your growth and your spikes, you are still fine.
Even if you are a small operation, you still have the option to lease hardware. Then your response time to add new servers is usually hours, not days like if you were to go the full own-your-hardware route.
As an exercise, you can try to rent and benchmark a $300/month server from e.g. SoftLayer and then compare that against a $300/month EC2 instance. Chances are, you will be blown away by the performance difference.
But the calculation is harder than that. People are terrible at estimating the ops personnel cost for DIY. Turns out it's hard running with minimal downtime, or engineering hardware + software to be fault-tolerant. It's hard to put a price on dev tools/APIs/doco/the whole eco-system.
Especially for that last reason, I have never been "blown away" by Softlayer, even when their instances were way beefier than anything AWS/GCP offered. YMMV.
It's hard to do that with cloud too. Your instances can go down at any time.
What you're trading off is a small number of people on-site at the data center (and it can in fact be small!) plus some slightly-old-school sysadminning versus new-school cloud-native development. Maybe the latter is easier to hire for (although I doubt that) or your current people find it more fun. But it's not like physical hardware dies all the time, or that you're immune from seeing the effects of physical hardware dying just because you're on the cloud.
You might save on hiring one person to replace disks when they fail and one person to write some PXE boot automation, but you'll need more than two people to run a Kubernetes at the same scale of compute capacity than those two people could have run for you.
If you are willing to pay a lot of money upfront, which you are since you're talking about building your own datacenter, AWS and Google will both give you huge discount and be significantly cheaper than SoftLayer.
Memory is easy to compare. But how does it stack up CPU-wise.
Another question is whether lower tier servers compare favorably or not. How does a $200-500/month server compare?
Lastly, is it possible that SL is just not competitive anymore, but another provider is? I have gotten out of that gig a few years ago so I honestly don’t know, but is it possible that Hetzner or someone similar is now the leader in leased hardware?
SoftLayer is a disaster on lower tier servers. You only go to them for higher tiers.
The market is very competitive and moving fast. Both Google Cloud and AWS have been rolling out high memory hardware recently at a reasonable rate.
Hetzner is amateur tier. It's cheap second hands servers with no ECC. It's not comparable to enterprise infrastructure.
(But then they were bought by a company who's "cloud strategy" is to put the word "cloud" in as many DB2 and mainframe contracts as possible.)
It's noteworthy that SoftLayer is now worse on all aspects.
I wasn't sure what your use "heavy" means here -- is it "a lot of" or "disproportionate"? Years ago there was much less flexibility with IaaS machine shapes, but I was super impressed when Google launched "custom machine types". There's a little slider to scale up/down your RAM/CPU independently, and of course storage is already independent. In fairness, there is some correlation between CPU allocation and storage IOPS, but that's inherent to scheduling a portion of the host machine reasonably.
Second, it depends on your expertise. And you don’t need scale. I ran a successfully bit of infrastructure on SL with seven servers that was quite a bit cheaper than an equivalent AWS setup. I was pretty much solely responsible for the infrastructure and still my responsibilities included more than 80% coding and later management. Given my salary at the time, it was quite cost effective.
if you have experience (and/or your team does) with dedicated, that should probably be for a non-trivial in decision making. likewise if you have more aws/cloud experience (or have those skills available on staff), there's benefits to that.
cloud skills don't magically happen - someone with minimal AWS skills can provide you with a setup that is more expensive to run, and possibly more insecure than a locked down dedicated box.
Handling variable load is not free in the cloud, so if you are going to pay it anyway, you can keep your predictable cost down and use the cloud when you have that temporary extra load.
Outgrowing AWS is a great problem to have.
> For a Linux user, you can already build such a system yourself quite trivially by getting an FTP account, mounting it locally with curlftpfs, and then using SVN or CVS on the mounted filesystem. From Windows or Mac, this FTP account could be accessed through built-in software.
Dropbox is not even an option for me because there is no official support or reliable way to run it on OpenBSD.
Dropbox is a consumer product with limited features and use cases. Smugly dismissing the needs and technical capabilities of power users is not going to endear you to anyone here.
I agree that hashkb's comment was flippant and unhelpful, but your reply was not any better and was the one that went off-topic.
1. a nice web interface;
3. an easy-to-understand pricing model;
4. tightly integrated desktop support (at least for OS X).
S3 and EC2 are by design commodity, low level artifacts. Like any other commodity, you should always be shopping around.
The big issue with Amazon has always been their “roach motel” philosophy where data goes in but doesn’t come out. (ie they hose you with network costs)
If you have enough scale and enough dollars, it is often cheaper to build bespoke infrastructure vs renting somebody else’s at margin.
This is why many companies with a lot of money still rent their offices – even if they intend to stay in the (often customized) buildings so they don't need flexibility. It just would not make sense to buy and to bind the capital in a building, it can be invested in their core business with higher returns.
An exception might be made in cases where a company has suddenly so much money that investing it all in its core business might diminish the capital/return rate, eg. through the US tax break, where a company shifts the money stored oversee back to the US. This is why for Google it might make sense to buy property, even though its not their business: They just have to much money.
 S3 at 0.023/GB/month * 500 = $11.50
A recent place I worked at didn't understand this. They were going to the cloud as a strategic move because they didn't want to run a data center any more. Their computing needs were stable and predictable - no "Black Friday" sales that needed triple the servers for a short part of the year. They were going to end up paying far more for the total computing cost than if they had just kept buying blade servers and VMWare licenses.
I'd argue that companies where the hosting costs are the primary, or even a significant cost, are a small minority.
Granted, I think this is more of a case of the prisoners dilemma  between lower management and upper management (lower management doesn't want to work on it because it doesn't produce features that make him look good but he also does not want to propose the additional developer for it because then upper management will just tell lower management to do it without the additional developer).
Fire team, comeback a few years with a new cheaper team and in-house pricing.
For most startups I would actually advise to start with Heroku, which is even more expensive than AWS (it is built on top of AWS). But you save on devops and building the CI/CD pipeline. For a small team it can make a big difference in terms of productivity.
For worker workloads like CI, renting dedicated hardware (like Hetzner) is usually cheaper and produces more reliable results. spot instances also work but have less reliability dues to machines cycling. The main factor for keeping everything under AWS would be egress bandwidth pricing or if the workload spikes are bigger than 2x.
For number 2 especially, there have been some cool projects for efficiency gains when different parts of the organization experience different traffic spikes. Like Netflix, where transcoding spikes precede viewing spikes, and they pulled off a load balancing coup to reduce peak instances.
I think the right thing for a tech company to do is to run their own data center in at least one town where they operate, and use cloud services for geographic distribution and load shedding.
The temptation to reduce your truck numbers below long term sustainable rates is too great, and so is lock-in. The best hunt I think you can do for the long term health of your companies these days is to hold cloud infrastructure at arm’s length. Participate, but strongly encourage choosing technologies that could be run in your own server rooms of another cloud provider. Like Kafka, Kube, and readily available database solutions. Make them avoid the siren song of the proprietary solutions. We have already forgotten the vendor lock-in lessons of the 90’s.
Much easier to go hybrid on Azure, at least until AWS and vSphere integration is ready for prime time
A startup I talked to not too long ago has a revenue of around 1M a month - pretty great. Their Amazon bill is around 1M a month - not so good. Their entire architecture is one AWS service strung into another - no optionality at all.
You're talking about vendor lock-in. Which is a totally valid point and something to be aware of, but basically orthogonal.
> (2) you have traffic spikes, so you to over-provision. There is then a tradeoff doing it yourself: do you pay for infrastructure you barely ever use, or do you have reliability problems at peak traffic?
> (3) your business plans shift or pivot
Anyone at any real scale has a multi-datacenter setup and AWS is effectively just a very elastic datacenter you can tap into. You could just as easily tap into Google Cloud or Azure. You do not need to operate 90% of your business in AWS to use AWS.
> The corallary of this is, for a well-established business with predictable needs, going in-house will probably be cheaper. But for a growing or changing or inherently unpredictable business, the flexibility AWS sells makes more sense!
Its still cheaper to go in-house with a multi-DC setup in anything but a single DC setup in AWS with very few nodes.
Architecture for a decent sized business should be setup with 3 DCs in mind /anyway/. You have two primaries, you scale <insert cloud provider> as needed and only leave some DB servers there most of the time.
Another commenter on this story mentioned $100k redshift workloads that were optimizable to $2k.
Yes. AWS shines when you have a small team with no competent DevOps people and a desire to build a single datacenter setup. However, at that scale...we are talking a small business / startup with less than 50 employees. If you are still doing that when you've grown past that point, you've got architectural problems.
Once you scale past that point, you still need a competent sysadmin to build a reproducible process...at which point even leasing hardware from any provider works. You cannot build a multi-DC setup that integrates redundant providers without a competent sysadmin.
Even if you stick with how AWS has you do things it will eat you alive if you do not optimize your workloads. I'm not sure how you would do that effectively without an experienced sysadmin acting as an analyst.
Also, in my opinion the know-how and full control of the complete stack is paramount to maintain a quality service.
In addition, you can do both. Tying yourself to one provider is an unnecessary risk and tying yourself to the cloud is not ideal.
I assume that answer is a no.
For me the logic is more like: get cheapers machines (be it in-house or with cheaper alternatives), that run kubernetes for example, and monitor them with Prometheus.
If you run out of capacity, defined by whatever metric you fancy from Prometheus, start EC2 machines for the burst.
Every month, re-evaluate your base needs.
Now DO, vultr, and other options exist to fill the gaps more.
Aws was never the solution to content distribution or infrastructure replacement. Just a company smart enough to notice big gaps and not afraid to fill them to get the ball rolling, maintain and progress, then move on.
- You're too small for efficient economies of scale on your own equipment (i.e. AWS is cheaper when considering total cost of ownership).
- You need to scale rapidly to meet demand
The second one is largely a data issue, if you have enough historical data on your customers, their habits, usage, and so on then scaling becomes predictable and even when it isn't you could offload only part of your infrastructure to a cloud vendor.
What's interesting is that several companies that I know which rely on AWS/Azure/et al aren't on it for either of the two above stated "good" reasons.
They are large businesses and do almost no automated scaling. They're on it for what can only be described as internal political limitations, meaning that they are on these services to remove the politics of technology infrastructure, one less manager at the top, a shorter chain of communications, an external party to blame when something does go wrong, and issues like HR/benefits/etc for infrastructure employees is outside the scope.
In effect they view themselves as "not a technology company" so look to employ the fewest technology employees as they can. Even in cases where technology is paramount to their success. It is very interesting to watch, and I'm not even claiming they're "wrong" to handle their infrastructure this way, just that it is hard to quantify the exact reasoning for it.
I'd argue that the core competency of Dropbox is its easy syncing. Dropbox wanted to get that to market quickly. If they had spent the time building out a data storage solution on their own, it would have meant months or years of work before they had a reliable product. Paying AWS means giving Amazon some premium, but it also means that you don't have to build out that item. It's not only about economies of scale and rapid demand. It's also about time to market.
I think it's a reasonable strategy to calculate out something along the lines of "we can pay Amazon $3N to store our data or store it ourselves for $N. However, it will take a year to build a reliable, distributed data store and we don't even know if customers want our product yet. So, let's build it on Amazon and if we get traction, we'll migrate."
S3 is a value-added service and creating your own S3 means sinking time. Even though data storage is very very near to Dropbox's core competency, it's really the syncing that was the selling point of Dropbox. To get that syncing product in front of customers as fast as possible, leveraging S3 made a lot of sense. It gave them a much faster time to market.
As time went on, they had traction, and S3 costs mounted, it made sense for them to start investing in their own data storage.
It's about figuring out what's important (the syncing is the product) and figuring out what will help you go to market fast (S3) and figuring out how to lower costs after you have traction (transitioning to in-house storage).
Yes, a lot of companies use cloud services when they don't need them. However, Google Cloud's compute pricing is reasonably similar to DigitalOcean (with sustained usage discounts) and from what I hear these companies will often negotiate discounts. AWS can seem a bit pricy compared to alternatives, but I'm guessing that Amazon offers just enough discounts to large customers that they look at the cost of running their own stuff and the cost of migration and Amazon doesn't look so bad.
Still, when you're trying to go to market, you don't want to be distracted building pieces that customers don't care about when you can rent it from Amazon for reasonable rates. You haven't even proven that someone wants your product yet and your time is better spent on delivering what the customers want rather than infrastructure that saves costs. As you mature as a company, the calculus can change and Dropbox seems to have hit that transition quite well.
I imagine this could be the case for a lot of smaller tech startups, and perhaps even some larger companies that don’t have significant web traffic or ongoing real-time computer services.
Something like Gusto might be a good example. I would guess that each of their paying customers (employees of companies using them) leads to only a handful of initial or yearly setup tasks and maybe a handful of web requests per month, but represents solid revenue.
The most obvious counterexamples would be any company with persistence real-time services, like Dropbox or Mixpanel, or companies with a huge number of web requests with a very small rate of conversion to revenue, like an ad network or an ad-supported social network or media site.
Of course, the dollar amounts dropbox saved are compared to those negotiated prices.
In my experience it isn't so much the storage price itself, but the network transfer that makes AWS absurdly expensive.
Most companies don't need something like S3. They can perfectly suffice with one server, maybe using RAID-1, or just using backups. Data corruption mostly happens through logical errors anyway and nothing in S3 will protect you from that.
S3 supports object versioning, which very much will protect you from anything other than writing the wrong data for the entire history of an object.
Anyway, there's plenty of filesystems that have versioning built in as well.
Besides, I've been tasked with recovering the specific state of a database where every version of every object was available, and it was only some 200k or so records. That took me about 2 weeks. (mostly for writing code that could find a consistent version of the whole thing)
Given the storage usecase of DropBox what would be the percent of saving if DropBox indeed went with Google or Digital Ocean?
In every one of those cases the group in charge has been the wrong group and it really makes you wonder who has been asleep at the wheel so long that this has occurred.
Maybe outsourcing to AWS for a couple of years is a good way to reboot the organization. Cheaper than slowly going out of business. When the fad dies down you start hiring people back who are a little more humble and cheaper than AWS.
It seems like this problem happens in organizations that equate headcount with power. Which I guess makes some sort of sense but doesn’t feel right. Plenty of companies do not have the majority of their employees working on producing the goods they sell. Especially if they’ve started automating.
But as I said above, this is an ‘asleep at the wheel’ situation. It seems like it’s often not the biggest problem these companies have with their vision.
And, it sometimes happens that actual tech companies start outsourcing their tech, which is a whole other troubling pattern.
Heh, at one previous company it would take 6-9 months to provision a new VM and 12-18 months for a physical. Those entrenched IT organisations absolutely deserve to get their lunches eaten.
Buuuut at my current workplace I am starting to see some slowdowns in doing AWS stuff as "departments" get more involved.
Like the "cloud team" does the account but networks must provision the VPC, and there's a "gateway review board" that gets involved if you define a new network egress or ingress etc etc.
I feel like many of the early advantages of cloud in enterprise are going to get eroded as the paper pushers catch on and "add value by defining process".
Yes, this is totally antithetical to modern working practices such as "DevOps"
At that place e.g. the VM team could provision you a VM (eventually) but they couldn't do you a login on it, that was some other team. But you couldn't raise the paperwork against that team until the VM was created, so everything proceeded serially, and each step at a glacial pace. The time ratio between them and Azure or AWS is literally under one minute for every month!
There was an old saying that 80% of outages were caused by humans. But there is more to account for than that. For example:
- Certifications that the org has obtained
- Contracts that the org has signed with customers
- Compliance with laws and regulations (and international)
- Insurance requirements
It can add up in a hurry and can slow an org down quickly.
It would be nice if an org could review all of its failures every few years and drop nearly all of its processes and procedures and start out fresh again - vowing to build around the failures of the past.
Indeed. That IT department is learning a lesson of what happens when your end users discover cloud providers and come into the office and tell their management how quick and easy it is. Managers don't know or care about the past - they only know that what should take minutes is taking months. If that department had been able to turn around a VM request in a day or two - not unreasonable - they would have been safe. Now they are being asked questions for which they have no good answers, such as, what exactly do all of you do all day?
and also in these days of data breaches where the sensitive data was just lying around on open s3, more are more places require approval from above before adding anything to aws.
While your reasons are valid you are missing an important one:
Resource scarcity: the engineers that I need allocate to infrastructure I rather have working on user facing features and improvements. Talent is scarce, being able to out source infrastructure frees up valuable engineering time.
This is one of the main reasons, for example, that Spotify (I’m not working for them) is moving to google cloud.
There can be advantages in that the developers more often can do the tasks passably well enough that you can spread tasks around, but if it's not accounted for a lot of the time people are fooling themselves when it comes to the costs.
When it comes to large companies like Spotify, the situation changes substantially in that they're virtually guaranteed to pay a fraction of published prices (at least that's my experience with much smaller companies that have bothered to negotiate).
This has been my experience working with companies that use cloud services as well.
Another big waste of time is on application optimization especially around database usage. Cloud services tend to provide very low IOPS storage (and then charge exorbitant amounts for semi-decent performance) which forces spending a lot of wasted time on optimization which would never be an issue on dedicated hardware.
It's generally the case across large parts of IT. I confused the heck out of the first manager I started sending itemized weekly reports of the cost of each functional area and feature requests (based on average salary per job description), as he'd never seen it before. But it very quickly changed a lot of behaviors when they realized the value of the resources spent on various features.
It is cheaper than hiring a full DevOps team which is a better apples to apples comparison. By spreading the load across the dev team I can automatically get a high bus factor and 24/7 hour on-call rotation. If the load cannot be spread across the team but requires specialized DevOps engineers then I lose both those very important points. Obviously once your company is large enough it's different but for small teams/companies it is an important factor.
My experience is that it often is more expensive when you actually account for lost development time.
> By spreading the load across the dev team I can automatically get a high bus factor and 24/7 hour on-call rotation.
... and my experience of this is that I'll never do it, ever, because of how it affects retention.
> Obviously once your company is large enough it's different but for small teams/companies it is an important factor.
Most of my clients have been small to medium sized, and I see the cost tip in teams with as few as 3-4 people at times.
This assertion supports what vidarh wrote. What you wrote has nothing to do with DevOps or software or engineering - what you are really saying is that you are saving money by coercing your developers into working two jobs at the same time. I have been in this position as a developer at a company where we had on-call rotations. This is a false economy and a quick way to increase stress, alienate employees, and increase turnover. Infrastructure tasks get neglected and are performed poorly because those tasks are now just necessary distractions to the main workload of feature development, to be gotten over with as quickly as possible. A lot of things get overlooked because no one "owns" areas like backups and disaster recovery.
It doesn't take many people to do soup to nuts businesses.. think WhatsApp's 50 engineers, Netflix's 100 person OCA team (if you don't think OCA is a standalone product you don't know much about technology business) doing 40% of the Internet by volume.. the vast majority of people just aren't very good that work in technology. Business governance grossly underestimates the effects of mediocre performance.
So the real question is why aren't governors trying to encourage WhatsApp and OCA style businesses, it's far more cost efficient. I understand why an organization itself empire builds, misaligned incentives.
Cloud services still need configuring and managing. You're saving on 2-3 days upfront on racking and cabling, on boxes that will last at least 3 years, probably longer. So if this is your only reason, you're making a false economy, eventually the costs of an undermanaged cloud will bite you (e.g. VM sprawl, networking rules that no-one understands, possibly-orphaned storage, etc).
"Infrastructure" is a little broader than just some cabling, much broader. You're also assuming that whoever will be in charge of DIY is a) more competent at scale than whatever will be scraped together for the cloud, and b) available with no resource cannibalisation.
The point the person you're replying to was trying to make was that for every "good" hire you're deciding where to allocate them, and sourcing plumbing from a cloud provider lets you allocate preferentially to product development (ie business growth). Even if you "pay more" for that setup, in theory business growth you will achieve more rapidly pays for it many times over (first mover advantage, market leader advantage, cost of money over time, etc).
The costs of pinching the wrong penny and making technical hiring more difficult, diluting your talent pool, can be the difference between huge success and too little too late. An undermanaged local setup that cost you 3 years on Time to Market will bite you long before 'eventually' comes, and you won't have oodles of cash to fix the problem.
For personaly projects, I use AWS and Azure (though am likely to migrate everything to a single box at OVH because it turns out to be cheaper for better performance - go figure) and it's made a certain amount of sense (up to now). At work we use dedicated hardware, because the cloud can't deliver the bang per buck.
At growth companies using cloud makes complete sense because it's all about time to market and iterating on your business proposition. Requirements change all the time and having flexibility of cloud offerings gives you the velocity. Whether at scale/in maintenance mode it makes much for sense to cut corners and optimise spending.
Either way you want to focus on what brings the most business value.
Those two reasons (for needing aws) are the technical problems that aws can solve, but you can't solve in-house. That is, they are not even solvable on a boardroom whiteboard, where the board pretend everything is just a matter of resource (money) allocation.
But (imo) most of the things that companies fail on... it's not because it is impossible to do a good job. They fail for less inevitable reasons.
In any case, I actually like the strategy where you try to be good at the things that you're good at but minimize things you need to be good at.
Dropbox knew that aws was expensive. If the numbers here are real, then in housing would have been a byug efficiency gain (on said boardroom whiteboards) for years. Makes sense when you consider what Dropbox does.
I assume they paid this price because it let them avoid being an infrastructure company. They would have had to be a very good infrastructure company. Why introduce this additional failure point, limiting factor or whatnot?
I (maybe you too) have seen the kinds of problems that aws solve be the limiting factor in a bunch of companies. The fact that they're technicaly solvable is almost academic, at some point.
tldr, sometimes it's good to solve certain problems with money.
- you need to iterate rapidly with with scale and reliability. if you have the right expertise this becomes very quick to setup. it lets you focus 100% on product iterations.
- you need (predictable) on-demand compute for crunching large amounts of data or running some batch jobs. it just doesn't make sense to do this on your own equipment.
- your median cpu utilization is low, so you want to save costs and you move to a serverless architecture, effectively moving the cpu utilization you pay for to 100%.
- But most importantly AWS isn't just compute and storage primitives: AWS has a vast array of abstractions on top of the cloud primitives: managed clusters, machine learning services, virtual desktops, app streaming, CI/CD pipelines, built-in IAM and Role based access control, to name a few !
Now more and more companies are locked in in Amazon.
Hard to find a good old Datacenter Admin.
I know perfectly well how to provision and scale a large infrastructure and can give you 99,999% availability in any application, BGP if needed.
Yet no one is interested in that. Sure I can write Ansible scripts and Terraform policies, but it's a miniscule part of my skillset and doing it on AWS is just boring compared to building the backend that powers it.
The counter argument I have is that at different sizes of operations, completely new skills become important, so you and your staff are left behind.
Example: my previous employer became large enough in terms of hardware footprint (~>1M cores) that it started getting difficult to find commercial colo space. How good are software and systems engineers at electrical engineering? :)
Granted, if you need 1M+ cores, you're going to be dealing with humans most places (including AWS) to get the best deal possible, and that also means the cost differences can change fairly substantially (e.g. the instances I know of that are in "ask us" territory are not paying anywhere near published prices)
That said 1M cores is not that much. Depending on your needs it's as little as "just" 500 racks. Plenty of managed providers will be happy to provide customized service to e.g. design and/or manage a setup for something that size.
You go to google cloud and order 10k preemptible instances per datacenter. That solves half of your problem. Then the same onto SoftLayer and AWS and revise monthly who is the cheapest one. It's not very difficult.
Contrary to you, I think that most places are not ready to accept that kind of contract. I have a friend who evaluated to move one of his compute grid to DigitalOcean and he was contacted to stop doing that, their servers are not meant to run at 100% CPU use all the time.
Regarding cloud being much higher cost: indeed. Anecdotal evidence; I know of a big reference customer who said they would consider the move to cloud a success if total cost didn't go up more than 2x. A few years in, that hadn't happened yet...
Twice as much would be a huge savings. In the cloud, I can give you 20k cores in one hour and I can give you 100k cores in two hours, after the initial investment to open multiple regions and networks backbones. The price is known in advance and the investment can be cancelled anytime.
Obtaining the same 20k cores from a colo will be a hassle. Months of negotiations with nothing being done. I'd expect any sane provider to bail if trying to discuss 100k cores, it's too much upfront investment. It would take years to get half of it, the project is long abandoned before anything is delivered and everyone who worked on it has left.
As a tech enthusiast I love what's possible with AWS, Azure, GC, etc. As a salesperson I don't mind selling these services (although the margins stink compared to selling VPS or dedicated). But there is a lot of cloud-overkill going on out there.
The big reason for most people is CAPEX vs. OPEX - even if it doesn't make financial sense in a dollar amount, it does in an accounting sense. Investors don't like to see big CAPEX numbers but seem fine with large OPEX ones.
If things go pear shaped large OPEX numbers resolve themselves as OP-erations get slimmed and shut down. Large CAPEX numbers, in the same situation, resolve themselves through liquidation and tears...
More importantly, OPEX comes from next years profits yielding a business I can loan against. CAPEX comes from last years profits, increasing the amount of loans I need to get it together.
It's the difference between thinking about short term profit margins and thinking about asset growth over time. Throwing a lot of optional cash today at a problem is better business than being forced to throw non-optional cash at a problem whenever the problem is feeling problematic. It's also quite freeing in terms of M&A.
That's a very illuminating way to highlight the accounting fears mentioned in the gp.
No one tries to build their own power station, or make their own laptops. They're better off using engineering resources on higher order stuff, unless, like Dropbox, the margins / TCO you are getting on your storage is an absolutely huge deal.
They might outsource most of the work, but then the gap between generating and using power is much more clearly defined.
someone asked me on PM how to get your hands on that pile of virtual cash. I'm no expert on that, and likely YMMV, but for us, we basically called account managers in the big 3 (GCP, aws, azure) and went shopping. With some PR visibility (i.e seed, some stupid "innovation reward", I guess any alternative empty proof of existence applies), we got offers (in the 50-500K range).
The way I see it, that's the de-facto standard - launch a VC funded startup, and cloud providers will align next your door to shell credits on your head.
Google also has a dedicated page where they partner with a bunch of VC's .
The best win is to build out in a data center with a 10G pipe into Amazon's network so that you can spin up AWS only on peaks or while you are waiting to bring your own stuff up. That gives you the best of both worlds.
AWS has a product called Direct Connect  to reduce bandwidth costs between AWS and your infrastructure.
There is a latency spike between local and Amazon infrastructure so it would be critical to build your system so that putting this spike in the mix didn't impact your own flow path.
I had thought a bit about how we might do that at Blekko and I would probably shift crawling into the cloud (user's don't see that compute resource) and move crawler hosts over to the frontend/index side. But I'm sure there would have been a bunch of ways to slice it.
They certainly can and likely will respond by tweaking their cost models - bandwidth costs at AWS are completely out of whack - e.g. list prices at AWS per TB are tens of times higher than Hetzner for example; I presume that's based on looking at what customers are most sensitive to. E.g. if you retrieve only a small percentage of your objects every month, it won't matter much. Similarly, if most of your retrievals is from EC2 rather than from the public internet, the bandwidth prices won't be a big deal, and you may not pay attention to how much you actually pay per TB.
The high bandwidth prices hit some niches much more than others, and it may be more expedient for AWS to keep it straight forward for those affected to put caches in front than it is to rattle the cage of other customers. E.g. if you consume huge amounts of bandwidths you'll sooner or later speak to specialized CDN's or start looking at peering anyway.
1. To outsource non-core activities to experts and reduce risk, for firms that see IT as a cost center. A cost-cutting measure.
2. To provide dynamic capacity for mature businesses that experience anticipated workloads that are short-lived (seasonal or computational needs). A cost saving measure.
3. To provide dynamic capacity for new ventures growing in popularity i.e. fluctuating capacity requirements. This saves on large upfront infrastructure costs when long-term viability hasn't been established. A risk management measure.
4. To be described as "innovative" because peers are doing it, for firms that see IT as a revenue center (in industries that view such investments as a source of differentiation). A form of virtue signalling.
5. To make the accounts look good to investors by accounting for it as OPEX instead of CAPEX . A seemingly irrational but valid reason. High OPEX numbers are easier to justify and more importantly, can be pared down with less friction than CAPEX, if things go south from intense competition for instance. Another risk management measure.
I make the case that colocating pays off at just about any scale, assuming you have $10k in the bank, have a use for at least 40 cores and are able to pay upfront to handle anticipated scale.
Hurricane Electric has prices online of $300/mo for a rack. On AWS, a single full c4 machine (36 threads) costs $1.591 per Hour x 24 x 30 = 1145/mo -- this is more than the cost of running a whole rack with 40 machines. Decent internet can be gotten for hundreds per month.
Ok, so how about buying your own machines? E5-2630 with 20 threads is $700 x 2 = $1400 + motherboard + disk + ssd brings it to several thousand, so it will pay off in at most 6 months, and we're not even talking bandwidth or storage costs. Depending on the application you could be looking at a payoff after 2-3 months.
Worried about installing or remote management? IPMI, iDRAC, etc included with basically every server make this a piece of cake.
The only good case for cloud are if you may suddenly scale 10x and can't predict it; don't have $10k in the bank; or don't have 1-2 months to order machines and sign a contract for rack space.
Sure for a few instances, its likely to be true because there is a certain amount of fixed overhead for dedicated hardware, but it remains a relatively low constant, but in reality there isnt much difference in terms of labour between wrangling hundreds of AWS instances, and hundreds of servers, and the servers will be many times cheaper to run.
DISCLOSURE - I am a Joyent employee.
It’s disingenuous to inflate the compensation for datacenter employees as a boogie man, or to wave away a “modern cloud DC” as a Herculean undertaking. There rarely is any R&D required, and stock compensation isn’t always necessary.
The only real example I have good knowledge of (1 of 1 cases) is a tech friend who worked for a firm acquired by a company wanting to build out data centers, and in the end everyone left after their acquisition stock awards vested, before really adding value to the corporation and delivering on the promise, because the company did not have a "tech r&d" comp plan in place to keep the subject matter experts employed. In the end the project and the DC expansion fell short.
Most people I know who build datacenters don’t expect a tech R&D budget or outrageous compensation, as it’s general contracting, racking/stacking equipment, and infrastructure engineering (pxe booting, remote consoles, reimaging for OS and/over hypervisor, etc).
Here’s a non-comprehensive list I generated on the fly yesterday of tech orgs who run their own datacenters or equipment in colos:
For all this talk about how expensive S3 would be for a filesystem, and how poorly suited AWS is for this kind of thing, Dropbox has seemed to make it work just fine.
For most people (including every commercial project I've ever worked on), the time-saving and safety benefits of relational schemas was far greater than any theoretical infrastructure savings.
This is so dangerous even to type out :) In some, limited cases, Dynamo can replace tables in an RDS, and completely outperform it, too. I'm a big fan, but it's fundamentally different from an RDS, and you can get burned, badly.
Oversimplified, DynamoDB is a key-value store that supports somewhat complex values. If you have large values, use S3 instead  - I think that's a good way to think of it, a faster S3 for loads of small records (with a nicer interface for partially reading and updating those records).
If you need to look up on anything but the primary key, be careful, costs can get out of control by having to provision extra indicies. If your primary keys aren't basically random, you'll run into scaling issues because of they way DynamoDB partitions. If you need to look at all the data in the table, DynamoDB probably isn't the right technology (it can, but scans absolutely tank performance = $$$).
DynamoDB as a hugely scalable limited-scope data-source in your app are likely where you'll find the optimal cost/scalability point. By using Dynamo for scalable read-heavy activity your let the rest of the app be barebones and 'KISS', preserve competencies & legacy code, and retain the benefits of relational schemas. Dynamos scaling then becomes your cloud-cost tuning mechanism.
By way of example(s): if your editors all use RDS and you publish articles to DynamoDb you could be serving tens of millions of articles a day off a highly non-scalable CMS. If all your reporting functions pull from DynamoDB you could be serving a huge Enterprise post-merger while using the same payroll system as pre-merger. Shipping tracking posted/grabbed from Dynamo, purchasing logic on the best Perl code you could by in 1999 ;)
If you are going to do joins and that sort of things, forget about dynamodb. It cannot do that and for a good reason. That being said, our architecture is mostly SPA so the lack of joins is solved at the client - there are just more calls to services client-side but the affect of that is still cheaper and faster product to run and maintain.
Does anyone know of a middle-layer solution that automatically concatenates files into larger chunks (similar to HDFS) prior to persisting to S3 and for retrieval utilizes HTTP Range requests to fetch the individual file? I'm going to build it if it doesn't exist. 100MB chunks would reduce our S3 PUT costs by ~1000x.
Edit: “dar”, “zip”, or “dump/restore” might be able to do this:
https://serverfault.com/questions/59795/is-there-a-smarter-t... (will create an archive with an index file for random access, vs tar streaming requiring reading the entire file)
If you come up with another solution, I’d love to hear it!
Even if you don't include any extra costs, just keeping 10 TB on Hetzner will cost you 40€ max, whereas with S3 it will cost you at least $230.
When we built JuiceFS , a POSIX filesystem on top of S3, there is feature in the roadmap to combine multiple small files together before uploading into S3 (each file is a slice of chunks in S3). We will let you know when it's ready.
Then use Lambda transform in firehose to index the byte ranges for each record (file)? (Biggest q in the process would be here... you might have to index after it hits s3)
S3 also allows the Range request header out of the box as well.
Interesting problem, and I can totally related with s3 put costs be astronomical.
Essentially our migration over the years looks like:
1) Moving to EC2 from a VPS, The Ec2 instance with same specs is notably slower and you need to add a second instance where one worked before.
2) Moving to a managed service like RDS after running the service on Ec2, managed service with same specs is notably slower and you need a second instance where one worked before.
In the end, it's worth it, in the RDS example you're getting millisecond replication times, point-in-time database recovery, hot failover, etc. But still, it would be great just once if you got the same performance from an RDS 2xl as you'd get running your own DB on a 2xl of your own.
Going out on a limb here but I'm guessing that Amazon makes a good amount of profit from their online retail store.
It goes something like this (from when you small to when you're huge):
- very/quite cheap
- prohibitively expensive
- much cheaper than building and maintaining own infra
- prohibitively expensive/more expensive than building and maintaining own infra
So various companies will end up at various stages of this cycle. Dropbox is big enough to be in the latest stage.
AWS is not a "once the sweet spot, always a sweet spot".
Bandwidth has always been insanely(!!) overpriced with most cloud providers.
Maybe if there will only ever be one client connected 24/7 and that never changes but I can’t think of a real product that would avoid taking on more that one client.
Using AWS meant that Amazon had IP logs for Dropbox's users. It had detailed information about Dropbox's business velocity. It had the ability to shape the customer experience of Dropbox's customers. In short, it made sense for Dropbox to move off of AWS for the same reasons that it makes sense for obvious Amazon competitors like Home Depot and Target not to build on AWS. However, unlike Home Depot or Target, the other major cloud providers, Google & Microsoft are also potential competitors in the file storage service market.
That doesn't seem fair to say, without proof. Why would they risk all the other business they do, plus GovCloud, etc, to spy on a company? Are AWS and Dropbox even competitors? I don't think that assertion holds at all.
They reduced spending on AWS by $92.5M, but still store 10% of data there so ~$103M?
Looking at the $53M cost for their data center vs the $92.5M decrease to the "third-party datacenter", they saved around 43% moving to their own data centers.
Put another way, use the cloud unless your valuation depends on cutting infrastructure operation costs
Those are mostly given. What runs on top of it, that's what interests me. Some orchestration/
private cloud software, some container platform, and perhaps a storage platform, which/what are those for Dropbox? Would it be standard OpenStack, Kubernetes and the true in-hous stuff: dropbox object storage?
If I remember correctly, they noticed they could dust costs significantly in storage compared to S3 because the vast majority of files in your Dropbox folder just sit there and hardly need to be read or written again (photos, etc.).
If they’re not using AWS for compute, it’d be interesting to see what sort of similar reasoning they have for why the costs are cheaper in house.
Bandwidth costs alone for S3 are high enough that I've set up storage systems for clients that used S3 for durability but put it behind huge caching proxies that held a fairly substantial proportion of their total data, where the reduction in S3 storage paid for the cache setup several times over.
Storage costs are harder to cut, but yes, S3 is typically expensive there too, but it's also riskier to do your own because of the risk of data loss. At Dropbox's scale it'll be worth it, but letting Amazon worry about durability is one of the few cases where I'll happily recommend AWS despite the high costs.
Smaller companies don't have the option to hire a whole IT team.
Is reporting massive savings on an infrastructure project a signal of softening demand and a weak product pipeline?
There's a kernel of truth in what you're saying, perhaps. Any growing company has to choose between devoting resources (engineers, cash, executive overhead) between different projects.
Alot of the time companies expand by capturing market share through adding features and expanding their TAM.
But it's foolish to suggest that lowering COGS and increasing margin means a company is in trouble.
Let's say your company is now making $1B/year (Dropbox) and you can choose between adding 3% more revenue or lowering your costs by 7%.
In most cases, for your IPO, you'll get a higher aggregate shareholder value by cutting the costs and increasing your margin.