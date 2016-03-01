He contacted the developer who said that they were shutting it down because the server costs were higher than the money they were making.
They were spending 5k a month on AWS crap and claimed it was impossible to get any lower.
He helped them consolidate everything onto a single rented dedicated server costing 400 a month. Now the service is profitable, and will stay up.
It runs way faster on the single server. It also has required less maintenance after the move too.
This kind of shit is everywhere. At this point simply not using AWS is a competitive advantage.
reply
The 50%+ profit margins have to be coming from somewhere. AWS is not made of magic, it's made from largely the same PC parts you buy on newegg
Charge more than what it costs you. That's how to make money.
Use the right tool for the job.
Incidents like this are generally why rate limits exist, which they don't currently have [0], but perhaps they'll consider a burst limiter in place to dissuade automated tests but not organic human load spikes.
Unfortunately there doesn't seem to be an easy way to fix the per-user ID write bottleneck, short of adding a rate limit to the API – which would push backpressure from Dynamo to the Segment API consumer. Round-robin partitioning of values would fix the write bottleneck, but has heavy read costs because you have to query all partitions. They undoubtedly performed such analysis and found that it didn't fit their desired tradeoffs :)
Great post, very informative. Thanks for sharing! Also, love the slight irony of loading AWS log data into an AWS product (Redshift) to find cost centers.
[0]: https://segment.com/docs/sources/server/http/#rate-limits
Give me what I save you in 2 months and I'll have a good business :)
I used that exact same model in Conversion Rate Optimization - get your conversion rate up, give me 30% of what we improve.
And built that into a 20+ person digital agency billing millions of dollars a year before being bought out.
Exactly how I did that and you can to:
(1) Wrote topical, detail rich posts similar to the parent here about problems I was solving in CRO for a handful of customers, never disclosing confidential customer info.
(2) Marketed those posts strategically. EG I wrote one about "Which trust symbol gives you the highest return on conversion rate." and then literally just bought Google Adwords of people searching for that question! StackOverflow and other forums also are great ways to market by answering questions (free + put your details in contact info) or running ad campaigns specifically on those topics ($5k+).
(3) Turned the best performing / most viewed posts into "pitches" for speaking gigs at materially similar conferences, most were accepted and I became an "authority".
Every post / conference / etc had a little "Want us to fix it for you? Full service, performance fee model." banner or mention.
Work literally poured in after that and we were lucky enough to be very choosy.
If you can SAVE large enterprises money and are willing to do it on a performance basis you've got a business.
Having said that, all the cost savings initiatives I've spearheaded are on my resume and LinkedIn profile and I take great satisfaction in optimizing those environments to save the client money.
I decided that all of my personal projects will be GCE. It is much more cost efficient already and Google will soon allow me to commit to future usage and pay my commitment as I go (Right now AWS forces you to pay upfront to get the same discount (~50%))
Fortunately, we cache everything (including failures) with Redis, so the actual cost is minute/incremental at most, but if you are not caching failures as well as successes, this can result in unexpected and really hard to track down cost spikes. (disclaimer: AWS cert SA, AWS partner)
Segment's trick to detect and record when throttling, and using that as a template for "bad keys" (which presumably are manually validated as well) seems like a great idea as well, but I'd suggest first caching even failure calls on logins if possible, as that probably would have mitigated the need to ever hit dynamo.
PS the name 'project benjamin' for the cost cutting efforts.. pure genius.
[1] https://segment.com/blog/ui-testing-with-nightmare/
Much of what allowed us to implement these savings quickly with a small team was the flexibility afforded by cloud infrastructure. Poor decisions are easy to reverse, but in a bare metal world you better be damn sure what you're doing, which slows down the decision-making process and seriously complicates experimentation. The number of people who know how to build out datacenters at the scale of thousands of machines is vanishingly small.
We'd also need to replace IaaS services like ECS, ELB, ElastiCache, RDS, and DynamoDB. There are certainly off-the-shelf replacements, but we'd need to build-out the expertise within our teams to operate these systems. We're talking roughly a dozen or so engineers working full time for many, many months to get these systems in place from scratch, on top of the even larger effort to design and build out datacenters. I'd much rather plow those cycles into efforts like expanding to multiple regions and improving reliability of internal services. That's a much better return on investment for our customers.
Right now we're in the sweet spot for the cloud. We're way too big to run on trivial amounts of hardware, growing at a rate that makes it difficult to stay ahead of demand in a datacenter-centric world, and too small to justify investing in a scalable hardware build-out.
My gut feeling is that you have to get to an extraordinary size to realize any meaningful savings, but that's primarily based on Dropbox's migration off AWS (https://www.wired.com/2016/03/epic-story-dropboxs-exodus-ama...).
It depends on what your service looks like: CPU intensive, Memory Intensive, Storage intensive? (In reality some unique mix).
You probably won't see a huge savings year one, as you'll be spinning up a lot of new things and have a fairly large CapEx expenditure. Now if your growth pattern is steady/predictable then you should be able to plan out your hardware buys or do a hybrid solution to handle traffic bursts.
One of the nice things about running your own hardware is that there are some costs that are easier to control. Don't need new hardware? Don't need to spend on new hardware for example.
You also have much more control over your environment so you are able to really optimize your code, and infrastructure so that you don't need to scale as large system wise.
But, back to the question on how to model it? You just gotta dig in, and make some educated guesses about performance,test and repeat.
He contacted the developer who said that they were shutting it down because the server costs were higher than the money they were making.
They were spending 5k a month on AWS crap and claimed it was impossible to get any lower.
He helped them consolidate everything onto a single rented dedicated server costing 400 a month. Now the service is profitable, and will stay up.
It runs way faster on the single server. It also has required less maintenance after the move too.
This kind of shit is everywhere. At this point simply not using AWS is a competitive advantage.
reply