I thought I'd answer some of your questions, as the person that pays the bill.
1. This can be cheaper on AWS. We've been meaning to move to reserve instances, paying a year at a time, for a while and simply haven't done it yet.
2. Fastly has already donate CDN usage to us, but we haven't fully utilized it yet as we're (slowly) sort out some issues between primary gem serving and the bundler APIs.
3. RubyCentral pays the bill and can afford to do so via the proceeds generated from RubyConf and RailsConf.
4. The administration is an all volunteer (myself included) effort. Because of that, paying a premium to use AWS has it's advantages because it allows more volunteers have help out given the well traveled platform. In the past, RubyGems was hosted on dedicated hardware within Rackspace. While this was certainly cheaper, it created administrative issues. Granted those can be solved without using AWS, but we get back to again desiring to have as low of friction on the administration as possible.
> In the past, RubyGems was hosted on dedicated hardware within Rackspace. While this was certainly cheaper, it created administrative issues. Granted those can be solved without using AWS, but we get back to again desiring to have as low of friction on the administration as possible.
If Rackspace can be of assistance in the future, feel free to reach out (firstname.lastname@example.org). We currently donate hosting to many open source projects, including ones in a similar space, like the Python Package Index.
Note that if you can get Rackspace or whomever to donate the hardware/bandwidth, you can use less than 7k/month to hire a very competent admin to solve the administrative issues, which would probably lead to better service for everybody.
Hey Evan, as with Rubyforge the last 7-odd years, you'd be welcome to a free account on Bytemark's UK cloud platform bigv.io, or dedicated servers, or a mix on the same VLAN. We're a Ruby shop ourselves, and we host a fair chunk of Debian in our data centre too these days (https://www.debian.org/News/2013/20130404). I So just drop me a line if that's of interest <email@example.com>.
I assume this was posted because it's an enormous bill :) but obviously if you're happy with it, carry on!
Did you consider using a mirror network, with servers run by external organizations, instead of going with AWS bandwidth for rubygems? Seems like that would be a good approach for the static/bulk part of your dataset, and there are lots of companies and universities who are set up to serve software. (The mirror I manage serves about 50 TB/month for several Linux distros, and many sites are larger.) Do the work and infrastructure required to manage these networks make them not worthwhile?
Edit: Found a post  calling for a rubygems mirror network. Otherwise there is lots of information about setting up local mirrors of the repository.
It's been discussed many times before, yes. Rubygems usage pattern by our users make any kind of mirror delay unacceptable. We currently run a number of mirrors, configured as caching proxies. I want to get us going on a CDN like Fastly soon because they provide effectively the same functionality but distributed to many, many more POPs that I will ever setup.
I suspect mirror delay is less of an issue than you might perceive it to be. Many CPAN mirrors manage to stay within tens of seconds/no more than a minute from the main CPAN mirror that PAUSE publishes to.
If it's just the sync delay, you could track each mirror's last-updated time and only direct users to a mirror that had synchronized with the master since the package in question was released. Otherwise, serve the content from AWS. Though I'm sure this couldn't beat the service that Fastly's donating.
Mirrors shouldn't be a security concern, the signatures of packages should come from "headquarters", same goes for reliability, clients should be able to, and SHOULD pull from multiple sites simultaneously.
I could be wrong, but it seems like a nice hack to pull for say 3 mirrors at the same time at some offset into the resource using a range get for say, 16k each. The first one to complete does a pipelined request for another 16k slot and this process continues until the entire asset is downloaded. The fast mirrors would dominate, a small percentage of the bandwidth from slow mirrors would assist and truly slow mirrors would be ignored.
It would be really interesting to see the bandwidth broken down by gem - I suspect rails would be at the top, but it'd be interesting to see.
If most of the installs are on servers, have you considered talking to server providers about setting up internal mirrors on their networks? That might save everyone a lot of bandwidth.
Of course, people shouldn't really be installing their gems from ruby gems on servers anyway, is there any way to prod bundler to make it default to package gems and do a local install where possible, rather than downloading them every time there is a deploy (the current default)? At present you use double bandwidth from people downloading once on their local machine, and once on their server to update.
Fetching the ruby gems index with bundler/rubygems still takes a while every time I bundle update, have you looked at optimising that part of the process further (at least it doesn't fetch a list of all gems now, but it still fetches a list of all versions of each gem doesn't it?), say caching older gem results? The list of gem versions available should not change for old ones, so you should really only need to fetch a very small list of latest versions. The memory usage and bandwidth usage is still quite high there.
I built S3stat (https://www.s3stat.com/) to fix this opaqueness that comes with using Cloudfront as a CDN and get you at least back to the level of analytics you'd get if you were hosting files from one of your own servers.
RubyGems guys, if you have logging set up already, I'd be happy to run reports for all your old logs (gratis, naturally) so you can get a better idea of which files (and as another commenter wondered about, which sources) are costing you the most.
Virtualization allows us to spin up new instances and migrate traffic to them. This means we can work entirely from chef and keep things clean. This is important for our volunteers to have a complete picture of an instance and to be able to make new ones.
Hey, as I mentioned in another part of this thread, my startup crunches those logs for a living (and they're sadly not really designed for crunching by anything that comes off the shelf). Ping me if you'd like a hand doing the crunching.
While one could probably knock a couple thousand bucks off that if one cared to (which is probably penny wise and pound foolish but invariably comes up in HN discussions of hosting costs), the amazing thing is that hundreds of thousands of people worldwide are able to use core infrastructure which costs less than the fully-loaded cost of a single billing clerk in your local municipal water department.
I'll agree mission critical software exists. I however imagine there are far more engineering projects across the planets whose failure result in mass casualties than software. There is a reason actual engineers are legally liable for their work.
What is funny is that Github is footing the bill for most package systems, which were likely inspired by ruby gems, yet Github itself was built with Ruby gems. I am pretty sure the hosting costs for homebrew/npm round to nil (I could be wrong).
A lot of the price is bandwidth. They are effectively being reamed by using CloudFront instead of negotiating a better rate with a "real" CDN (which will also give then much better performance, as CloudFront doesn't have many edge locations).
(Although, actually, while I verified their total dollars spent is greater than what would be required to get a fundamentally better deal on bandwidth, I didn't take into consideration that once you slash their costs the amount they would be paying might no longer be ;P.)
> They are effectively being reamed by using CloudFront instead of negotiating a better rate with a "real" CDN (which will also give then much better performance, as CloudFront doesn't have many edge locations).
You can negotiate with AWS to get the same Cloudfront pricing as you would with Akamai. I know because I'm in the the process right now.
More importantly, they could be running on 2-3 dedicated servers at OVH or Hetzer, and have Cloudflare in front of them instead of Cloudfront. Or, if they insist on Cloudfront, switch to Price Class 100 (US and EU only). Its cheaper, and latency isn't that much higher vs serving out of all Cloudfront locations.
As long as most of your content is static, and you have a solid CDN, your origin doesn't have to be highly reliable or scalable. Its just an object store to persist data for the CDN.
Exactly, and if you include a little Powered by AWS CloudFront i am pretty sure they could drive down the price a lot.
Or, they could start talking to Fastly, I am pretty sure they can work out a much better deal while being faster.
MaxCDN is a very low-end "CDN". If you can buy your account from the website without talking to an account manager, and the plans are as low as $9/month, you should not expect a lot of performance, features, locations, etc.: what you should, however, expect is "cheap"... MaxCDN is appropriately cheap.
To look at something more reasonable: CDNetworks is realistic competition; they are strong in Asia, and were the people I was comparing the pricing to (so they aren't going to be horribly expensive). According to the comparison website you are using, they have almost four times as many edge locations.
Honestly, though, the reality is that the really great CDNs don't even have data on this website (even for CDNetworks I think this data is not accurate: looks like an approximation): the leaders in this space are Akamai and Limelight, and both just show "Not Available" for the number of edge nodes they have.
Even going a little lower on the CDN pecking list, though: Level3, which according to this website you are using is mostly "competitive" with CloudFront (sometimes actually worse) in the regions CloudFront bothers to cover, is clearly covering entire subcontinents where CloudFront has nothing.
The reality is that CloudFront is still trying to grow out a network: they have poor coverage in Europe (which is pretty key), a few nodes in Japan/Singapore, and then next to no coverage anywhere else. Yet, they insist on pricing their product as if they were a big player (12c/GB is Akamai-level expensive).
(So, do I get to condescendingly say "this is nonsense" now? I mean, seriously: you clearly didn't spend much time using this website and you didn't look into who the leaders are to verify you weren't comparing low-end to low-end... also, I think you are not appreciating that 0->2 is infinitely better ;P.)
Well, like I said, I didn't check just one, but about half in the list - maxcdn was just in the link because you can't link that site to just one service. Akamai had nothing listed, and level3 and cdnetworks weren't in the ones I checked. From what I saw, they still have more than most.
I still think you're mischaracterising AWS as being a bit player - they have a decent presence with Cloudfront, it's just that there are a couple that are bigger. Like I originally said, 'more than most'. CDNetworks certainly does pound them in numbers, though.
Some games require massive amounts of compute, but the bandwidth to deliver the assets is generally paid by Apple.
I can guarantee you, your company is paying a metric fuck-ton more. It is called Apple's 30% cut.
Your company is paying AWS $200k to pass json messages around for analytics and social aspects of the game. You are paying Apple something like $1 million per week to distribute, market, and collect payments for the game.
I am not saying your company is dumb, or Apple is evil. I am saying your experience and anecdote isn't relevant to Ruby Gems, and offering a different way to think about the games industry vs. the open source software distribution world.
We aren't paying that much in cut just yet. We're a small team (6 engineers in total). You don't have to be pulling in millions per week to get high on the grossing charts. We're probably around 1/4 of what you estimated.
Though you mention delivering the assets. Actually (like a lot of games) we make a big effort in getting under 50MB over-the-air limit on the App Store. The total content for retina iPhone is ~300MB, delivered in parts as you progress in the game. That's kept on S3, downloaded through CloudFront.
But yes! You're right, it's mostly a hell of a lot of JSON flying around.
We support non retina devices, which are stuck on iOS 6. When this came out we weren't sure whether it applied to that too, so stuck with 50. We'd already been keeping it under 50 for 6 months by then so we had all the infrastructure set up, its mostly automated.
Haven't looked at it since iOS 7 launch though, do you know if it was iOS 6 too?
Have somebody spend a day or two looking for low-hanging performance fruit. Start with your JSON library, there are some slow ones out there. Also see if you might be unnecessarily de/serializing data structures multiple times in a single thread or process, I've seen that kind of thing creep up over time in reasonably modular codebases.
The most interesting thing that I found about dealing with stuff at 6 figure+ scale per month on AWS was the un-advertised limits (nodes, provisioned volumes / total size, snapshots, elbs, etc) that you have to either hit, or extract from your account manager.
If anyone ever ends up doing something like this; ask them upfront!
Package Control is a far cry from the scale of RubyGems. PC uses a little over 2TB a month, whereas my calculations show RubyGems using around 50TB.
That said, early on I chose Linode because of their generous bandwidth that is included with the boxes. For the price of less than 1TB of AWS bandwidth, I get 8TB, plus a decent box. The bigger boxes have an even bigger proportion.
I'm not posting this to give any suggestions for RubyGems - I know nothing of the complexity of that setup. Mostly just figured I'd share the research I did for finding reasonably priced bandwidth.
We run into the same cost-related problems for our CDN. What we did to solve it was to rent dedicated servers that are near AWS regions. We used Route53 latency based routing to route traffic to that dedicated servers + Nginx + LUA. We're serving 300+ TB of traffic per month and the total price is just a percent of the RubyGems AWS Bill. There is some maintenance included with this solution and the problem is finding the right dedicated server providers.
Since it can take a bit of time to read through the invoice, here's a summary of the bill:
Data Transfer $3,597
S3 $ 228
While "bandwidth" costs equate to ~$4,668/month, only $1,071 is CDN (CloudFront), with the balance just raw Data Transfer.
Since lots of folks are commenting, and not everyone realizes the difference it's also a good time to point out the CloudFront vs. Data Transfer distinction.
Using Amazon's terms... Data Transfer means anything directly served/coming from EC2 or S3 (or a few other services which aren't relevant here), but NOT anything for CloudFront (which is, obviously, a separate line item, as shown above).
The bulk of CDN (CloudFront) usage ($735 worth or 69%) is US.
The bulk of Data raw bandwidth (Data Transfer) usage ($2,931 ~80%) is US East.
Is any of this good/bad/right/wrong? I have no idea. That depends quite a bit on what THEY are doing with it and why. For example, it can be cheaper to distribute from CloudFront versus straight from S3 for some use cases. Though, generally, you are not only looking at using CloudFront to save money over S3 ...there's typically a performance reason.
And sometimes the hosting costs simply don't matter. It's easy for us engineers - siting here on HN - to sit at our keyboards and play around with hypothetical ways to save money. This isn't necessarily a bad thing, but there are numerous things in IT that it doesn't make sense to optimize. Why? Because the ROI on the engineering time, CapEx, and OpEx (and the time, energy, and focus of ANYONE involved or impacted at all) to do the optimization doesn't outweigh the opportunity cost.
Sometimes there are simply better uses of our limited capital and time.
Not everything needs to be optimized. And the argument gets stronger when there are other factors more difficult to factor in: adopting a platform that isn't as widely known or isn't backed by a similar level of maturity (even with it's quirks, at least they are well known), etc.
The risks/concerns not only vary between organizations, but often from one period of an organization's growth to the next. The beauty is every organization gets to make their own decision ...and none of them have to give a damn if the HN community agrees or not. :-)
While by no means insignificant, this bill is no where near what I'd imagine would warrant a HN post. I wouldn't be surprised if most startups beat this regularly.
The startup whose backend I co-created racks up an AWS bill that hovers around a half million dollars a month. We make use of all of the ways to save with Amazon: pre-paid reserved instances, negotiated deals, etc. And we're not even that big; imagine what Netflix's AWS bill must cost?
We've tried other providers, toyed with co-locating, but at the end of the day the flexibility and cost benefit of IaaS outweighed the lower base price of CPU cycles when you roll it yourself.
> this bill is no where near what I'd imagine would warrant a HN post.
Can only guess at why folks like any post, but it's not necessarily how large the bill is. Maybe it's how low it is for a service that's widely relied on, or maybe it's the level of transparency, which turned out to include evanphx above showing up to answer questions about the project.
With most of this being bandwidth costs, it seems like switching to a host like Digital Ocean would make more sense here. The bandwidth costs are a fraction of Amazon's in comparison.
As for the CDN, switching to something like Cloudflare might make more sense rather than relying on Cloudfront. At the least, there's a "US and EU only" option for edge locations to use which si considerably cheaper than the default option of all edge locations.
Why was this even posted?
Looking for help reducing it?
Complaining about the amount spent?
Looking for a pat on the back?
I saw a talk at Ruby/RailsConf about the work spent building and maintaining rubygems.org. It smelled a bit martyrish. "Look at the thankless work we perform behind the scenes".
Well, if help is required building or operating rubygems.org, please just say so. As a seasoned Ruby developer I'd be more than happy to contribute development time, and as a daily user I'd be willing to commit financially in a small way towards operating costs. Not that that is required - given all the offers of free hosting this post received in response.
If we don't know about a problem, we can't help. Just ask if help is what you want. It's not like the Ruby community doesn't have great communication channels.
So, what this really comes down to (after a good nights sleep) - is what type of traffic/transactions are you running on your back end infrastructure.
If the data is static, then you can probably (these days) cut your costs for 25 Terabytes/month from $8K to $800 (or, in your extraordinary case, $80), simply by being a bit intelligent as to how you make use of VPS/CDN/CloudFlare Transfer allocations.
On the flip side, if much of the data you are transferring out is the result of dynamic back end transactions, queries, and generation, then it's unclear to me that you can (easily) recognize the savings that you might see when generating static content.
I'm interested in knowing if CloudFlare will start throttling/shutting down people who pay $20 and use 25 TBytes in the long term though - that alone, for some organizations, will cost them more than the extra $8K they would pay to AWS (who, have zero problem with you using 25TB, 250TB, 2.5PB, etc...)
Yeah, I'll admit that other CloudFlare customers are likely subsidizing the amount of bandwidth I'm using.
Funny thing - back when I was using 10 TB/mo, my site was hosted entirely on DreamHost's $9/mo shared hosting. I moved mostly because I was starting to get several hours a month of downtime - presumably, they were gently nudging me off their service.
I've seen plenty of $60-$100 dedicated servers come with unlimited-use 100Mbit connections, which work out to 16ish TB/mo before you start getting to 50% saturation. Of course, those are still subsidized in that that pricing is possible only because most people who buy it don't max out a 100Mbit connection.
Still, though, S3's 9-12¢/GB bandwidth pricing seems a bit high. Bandwidth at DigitalOcean (presumably unsubsidized) is 2¢/GB, which comes out to a much more manageable $500 for 25 TB.
With dynamic content, CloudFlare has Railgun, which takes advantage of the fact that dynamic content is usually mostly static. Still, though, if you have 25 TB of dynamic content, I presume bandwidth stops becoming the limiting factor in your cost of operation.
I'm not sure if RubyGems gets more traffic/has more intense computational needs/has more users than OkCupid, but that used to be hosted for about ~2-3K/mo from what I recall. However, that's not amortizing the cost of the hardware.
Only the person costs associated with the configuration/management of the servers - not the people time associated with the code, and high level system administration (which you still need with AWS).
I.E. The people who bought the servers, racked the servers, went down to the CoLo at night, set up the virtualization environment, hooked up the routers, configured the routers, the switches, the firewalls, the vlans -- those people I am including.
I'm not including the DBAs who manage the schema, people who push the code, do the design, etc...
I've worked for one of their (direct) large competitors, but haven't worked for those three companies.
I've currently got active accounts with all three of those VPS providers - I love them, and use them every day - particularly Linode, but also Slicehost/Rackspace, and DigitalOcean. I even have a bare metal server at ServerBeach - which I realize I need to shut down...
At this exact instant I have six terminal windows open across DO/Linode. I host a moderately popular California Food Blog, and have about 15 years experience in various companies that have had hosting responsibilities.
I'm not saying you can't do great things with the VPS providers - I'm just suggesting that the tradeoff between saving $2-$3k (at most) with Digital Ocean, would be more than made up by the technology risk, hassle of having to re-invent a lot of the services that you get automatically from AWS.
That could change sometime in the (near) future - but right now, AWS is an easy (and honestly, all things considered, relatively cheap) solution for this type of application.
I'm referring to Digital Ocean's technology. AWS (and Media Temple) had it's teething pains as well in it's first 4-5 years. I remember some fairly broad outages with their back end storage - but they've mostly dealt with those, and, the risk has gone down.
Note - there is another option - Deploy on multiple Platforms and be smart with your DNS balancing (http://www.dnsmadeeasy.com/services/global-traffic-director/) when serving content. Particularly now that Digital Ocean is in Singapore/Amsterdam/NewYork I can think of some useful things I could do with $10/month droplet (2 Terabytes of Transfer) $300/month, in theory, gets me 20 Terabytes in Asia, 20 Terabytes in Europe, 20 Terabytes in North America. Now, whether DO would shut me down if I actually started using that Transfer is another question altogether...
There is no such thing as an open source CDN; there is a free CDN called Coral Cache (http://www.coralcdn.org/). CDNs cost money because you're dragging content from an origin to edge locations all across the world, and keeping it hot for client requests.
You could simply serve the content out of nginx, but you wouldn't see the performance benefits of keeping your content closest to the end user.
> The bias towards AWS for this type of application is ridiculous and a big waste of money.
They could get an even better deal by just going through a dedicated server provider (or even better, colocating).
There's little advantage with choosing DO versus going with a dedicated server provider (and again, colocating). I guess the advantage would be the control panel that they wouldn't use, having a few one-click stacks that they won't use, stuff like that.
If someone can afford a $7,000 AWS bill they can afford to put some money towards hardware and an onApp license if they want "cloudy" stuff. To colocate their hardware it would probably run them anywhere from $400-$800 a month depending on where they go. Their total bill would be decreased by $5500 a month. The upfront investment of the hardware wouldn't be more than $12,000 either. LOE? Probably two weeks with a competent sysadmin.
Yes you can have issues with your hardware and stuff and then you have to take care of that, but if you're good with your DC, they're great to you.
Bandwidth is by far their biggest cost, colocation/dedicated hosting would save a substantial amount but you are still going to be looking at something in the ballpark of $1,000/mo for 1Gbps. (Unless Cogent has slashed prices even further)
at $600/month you've only saved them $1500/month (the hosting portion is only $2.1k), and now they also have physical hardware to manage, requiring a broader skillset from the volunteers, plus someone having to be in physical proximity for 'on-call' issues.
I don't know what datacentres tend to charge for data transfer, but as that's the largest item on the bill, it's the more salient point.
Also, just because it's not on the bill doesn't mean they're not using other AWS services; there are several free ones.
How much does that really matter? Even going to the otherside of the world is only 200ms or so, and the time taken to run rubygems is hardly a factor in just about any workflow I can imagine.
Think of it another way - what would be more valuble - RubyGems hosted on CDN, or RubyGems on DO and give a couple of grants for talent hackers to work on their gems fulltime for a few months. (ala GSoC)
Even if you ARE concerned about latency, have one download server in the US (E.g. DO), one in Europe (e.g. Hetzner) and one in SE Asia (Not sure who's cheap and good-ish there), and you'd still be a 1/4 the cost of AWS bandwidth or less.
Person who pays the bill here. Latency does matter and we're paying an additional $1.5k new in Feb to improve european latency. RubyCentral can afford to spend the money to improve latency issues, so we do!
Really, why would that be the case? I mean the difference in Latency.
A self set up Linode CDN with all six location
Would have provided 48TB of Pooled bandwidth at a very decent speed and cost around $480. Linode's Network are great, much better then DO. I am not sure if it match CloudFront, which isn't exactly the fastest CDN anyway.
I have 10 fingers, so that is definitely not "countless" hours of work. And No, Maintenance are minimal or non existent. You could even get smaller VPS behind each node balancer as HA. Since Linode VPS ( Unlike DO ) are deployed on physically different hardware.
While i say it is fair enough to use AWS because money doesn't matter, i thought there are definitely some better alternative for the same price( if you really cared about latency ) or cheaper options.
That's been my impression so far with repositories where you configure a mirror explicitly, like Debian or CPAN. I used to be diligent about doing "the right thing" and switching out the default (usually something in the U.S.) for a Danish or nearby mirror. But I've stopped caring much because it doesn't really seem to make any perceptible difference. If I remember to, I still will switch it just so I don't unnecessarily waste intercontinental infrastructure, but it doesn't make much difference to my own experience.
It's often difficult to migrate providers when the application is complex or the owners may see the value in the provider. They're falling right into the hands of most cloud providers evil plans, they make it cheap to get started but as time moves on, it becomes more difficult to migrate away.
We moved the stack to Amazon in about 60 hrs last year (gems were already on S3). Given that time involved writing a lot of chef recipes, I'd say if pushed we could move out again in an even shorter period of time.
I guess host it in AWS is a benefit for the integration with other services hosted in Amazon like TravisCI (the most popular CI for open-source Ruby projects) and Heroku (the most popular hosting for Ruby projects)
could easily knock multiple thousand bucks off of that by just reserving the ec2 servers you know you'll need, plus reserve the cloudfront bandwidth you know you'll need (for the amount of data served I believe you should be able to cut CF costs by at least half).
3 year heavy EC2 reservations pay for themselves in ~7 months, cloudfront reserved bandwidth is just a 12 month agreement so that costs nothing up front. You might want to experiment with some different instance types though, depending on your resource utilization. Personally I really like using the new c3.large instances for my web servers and anything else that needs more CPU than memory, proportionately. If the standard instances suit your needs better you still might want to move to the m3 class.
Aside from those two items it looks like you are sending out a considerable amount of stuff from EC2->internet (27 TB transfer out from US-East to internet). I'd recommend looking at whether you could set up a cloudfront distribution with your EC2 servers as its origin.