
Rubygems.org AWS bill for Feb 2014 [pdf] - vbrendel
https://www.dropbox.com/s/3v13d2g8klwer3d/AWS%20-%20RubyCentral%20-%20Feb2014.pdf
======
evanphx
I thought I'd answer some of your questions, as the person that pays the bill.

1\. This can be cheaper on AWS. We've been meaning to move to reserve
instances, paying a year at a time, for a while and simply haven't done it
yet.

2\. Fastly has already donate CDN usage to us, but we haven't fully utilized
it yet as we're (slowly) sort out some issues between primary gem serving and
the bundler APIs.

3\. RubyCentral pays the bill and can afford to do so via the proceeds
generated from RubyConf and RailsConf.

4\. The administration is an all volunteer (myself included) effort. Because
of that, paying a premium to use AWS has it's advantages because it allows
more volunteers have help out given the well traveled platform. In the past,
RubyGems was hosted on dedicated hardware within Rackspace. While this was
certainly cheaper, it created administrative issues. Granted those can be
solved without using AWS, but we get back to again desiring to have as low of
friction on the administration as possible.

Any other questions?

~~~
teeparham
Do you know who are the biggest consumers of bandwidth? I would guess the CI
servers (Travis, Circle)

~~~
dwwoelfel
I think that bandwidth consumed by Circle should be free, since we're also
hosted in AWS. Maybe somebody who knows more about the details of Amazon's
billing can confirm/deny.

~~~
DrJ
bandwidth is free in the same region, but not across regions.

*edit and I believe not if you end up using the public IP address instead of the internal ip address.

~~~
tbarbugli
if you use the ec2 public dns it will resolve to an internal ip when the
request comes from within ec2

------
patio11
While one could probably knock a couple thousand bucks off that if one cared
to (which is probably penny wise and pound foolish but invariably comes up in
HN discussions of hosting costs), the amazing thing is that hundreds of
thousands of people worldwide are able to use core infrastructure which costs
less than the fully-loaded cost of a single billing clerk in your local
municipal water department.

~~~
jcampbell1
What is funny is that Github is footing the bill for most package systems,
which were likely inspired by ruby gems, yet Github itself was built with Ruby
gems. I am pretty sure the hosting costs for homebrew/npm round to nil (I
could be wrong).

~~~
jpfuentes2
If you mean the npm homebrew package, then yes. If you mean npm packages, then
you might be living under a rock.

------
amalag
BTW this is paid for by : [http://rubycentral.org/](http://rubycentral.org/)

~~~
evjan
Thank you, Ruby Central!

------
incision
At a glance, this looks like AWS being used like a dedicated host, which as
demonstrated, isn't exactly cheap.

There's no spot or even reserved pricing, just a bunch of on-demand instances
that were up 24/7 for all 28 days in February.

Seems like a genuine dedicated host, reserved instances or an architecture
that leverages the elastic in elastic compute cloud would be worth
considering.

~~~
saurik
A lot of the price is bandwidth. They are effectively being reamed by using
CloudFront instead of negotiating a better rate with a "real" CDN (which will
also give then much better performance, as CloudFront doesn't have many edge
locations).

(Although, actually, while I verified their total dollars spent is greater
than what would be required to get a fundamentally better deal on bandwidth, I
didn't take into consideration that once you slash their costs the amount they
would be paying might no longer be ;P.)

~~~
vacri
_CloudFront doesn 't have many edge locations_

This is nonsense. They have more edge locations than most. I didn't try all
comparators in the list, but out of half of them I tried, none had more than
Cloudfront:
[http://www.cdnplanet.com/compare/cloudfront/maxcdn/](http://www.cdnplanet.com/compare/cloudfront/maxcdn/)

So if Cloudfront has 'not many', who has 'many', and how many is that?

~~~
saurik
MaxCDN is a very low-end "CDN". If you can buy your account from the website
without talking to an account manager, and the plans are as low as $9/month,
you should not expect a lot of performance, features, locations, etc.: what
you should, however, expect is "cheap"... MaxCDN is appropriately cheap.

To look at something more reasonable: CDNetworks is realistic competition;
they are strong in Asia, and were the people I was comparing the pricing to
(so they aren't going to be horribly expensive). According to the comparison
website you are using, they have almost four times as many edge locations.

[http://www.cdnplanet.com/compare/cloudfront/cdnetworks/](http://www.cdnplanet.com/compare/cloudfront/cdnetworks/)

Honestly, though, the reality is that the really great CDNs don't even have
data on this website (even for CDNetworks I think this data is not accurate:
looks like an approximation): the leaders in this space are Akamai and
Limelight, and both just show "Not Available" for the number of edge nodes
they have.

Even going a little lower on the CDN pecking list, though: Level3, which
according to this website you are using is mostly "competitive" with
CloudFront (sometimes actually worse) in the regions CloudFront bothers to
cover, is clearly covering _entire subcontinents_ where CloudFront has
nothing.

The reality is that CloudFront is still trying to grow out a network: they
have poor coverage in Europe (which is pretty key), a few nodes in
Japan/Singapore, and then next to no coverage anywhere else. Yet, they insist
on pricing their product as if they were a big player (12c/GB is Akamai-level
expensive).

(So, do I get to condescendingly say "this is nonsense" now? I mean,
seriously: you clearly didn't spend much time using this website and you
didn't look into who the leaders are to verify you weren't comparing low-end
to low-end... also, I think you are not appreciating that 0->2 is infinitely
better ;P.)

~~~
vacri
Well, like I said, I didn't check just one, but about half in the list -
maxcdn was just in the link because you can't link that site to just one
service. Akamai had nothing listed, and level3 and cdnetworks weren't in the
ones I checked. From what I saw, they still have more than most.

I still think you're mischaracterising AWS as being a bit player - they have a
decent presence with Cloudfront, it's just that there are a couple that are
bigger. Like I originally said, 'more than most'. CDNetworks certainly does
pound them in numbers, though.

------
stickydink
Right now we're a top 25 grossing iPhone game developer. The last AWS bill I
saw was January's, a little under $200k.

I'm not on the server team, so I don't know exactly what contributes most to
it. But part of me really thinks it could be reduced!

~~~
jcampbell1
This bill is 2/3 bandwidth, and 1/3 compute.

Some games require massive amounts of compute, but the bandwidth to deliver
the assets is generally paid by Apple.

I can guarantee you, your company is paying a metric fuck-ton more. It is
called Apple's 30% cut.

Your company is paying AWS $200k to pass json messages around for analytics
and social aspects of the game. You are paying Apple something like $1 million
per week to distribute, market, and collect payments for the game.

I am not saying your company is dumb, or Apple is evil. I am saying your
experience and anecdote isn't relevant to Ruby Gems, and offering a different
way to think about the games industry vs. the open source software
distribution world.

~~~
stickydink
We aren't paying that much in cut just yet. We're a small team (6 engineers in
total). You don't have to be pulling in millions per week to get high on the
grossing charts. We're probably around 1/4 of what you estimated.

Though you mention delivering the assets. Actually (like a lot of games) we
make a big effort in getting under 50MB over-the-air limit on the App Store.
The total content for retina iPhone is ~300MB, delivered in parts as you
progress in the game. That's kept on S3, downloaded through CloudFront.

But yes! You're right, it's mostly a hell of a lot of JSON flying around.

~~~
mbaird
FYI, the OTA limit was increased to 100MB back in September.

We're managing to squeeze our apps into this at the moment, but will likely
need a similar solution using S3/CloudFront in the near future.

[1] [http://www.macrumors.com/2013/09/18/apple-increases-over-
the...](http://www.macrumors.com/2013/09/18/apple-increases-over-the-air-app-
store-download-limit-to-100mb/)

~~~
stickydink
We support non retina devices, which are stuck on iOS 6. When this came out we
weren't sure whether it applied to that too, so stuck with 50. We'd already
been keeping it under 50 for 6 months by then so we had all the infrastructure
set up, its mostly automated.

Haven't looked at it since iOS 7 launch though, do you know if it was iOS 6
too?

~~~
mbaird
Just checked with a couple folks here, the limit is for the iTunes store, and
therefore was also raised for iOS 6 (we assume older versions as well, but
don't support them either).

~~~
stickydink
Interesting, thanks. Maybe next time we update, I'll try to convince everyone
to panic-delete a little less :)

------
wbond
Package Control is a far cry from the scale of RubyGems. PC uses a little over
2TB a month, whereas my calculations show RubyGems using around 50TB.

That said, early on I chose Linode because of their generous bandwidth that is
included with the boxes. For the price of less than 1TB of AWS bandwidth, I
get 8TB, plus a decent box. The bigger boxes have an even bigger proportion.

I'm not posting this to give any suggestions for RubyGems - I know nothing of
the complexity of that setup. Mostly just figured I'd share the research I did
for finding reasonably priced bandwidth.

------
ilaksh
The thing is there are many providers who can do the same and most of them
will do it for less than half of this. Some less than 1/5th. I think they
should move this to Digital Ocean and save $5000.

The bias towards AWS for this type of application is ridiculous and a big
waste of money.

~~~
ghshephard
Whenever anybody makes this type of statement, I'm alway interested in knowing
if they've ever run a site with this type of traffic, and this many customers.

In particular, have you ever run a site that consistently serves over 25
Terabytes of traffic/month, or have you worked with someone who has?

I guarantee you that no company I have worked for in the last 15 years, could
have ever run this type of infrastructure for $7K/month. Its absolutely
amazing.

~~~
Zarel
My site serves 25 TB/mo, and it costs me $80/mo...

$60/mo for a dedicated server, $20/mo for CloudFlare. The dedicated server
only serves 1 TB of it, the other 24 TB is static assets cached and served
directly by CloudFlare.

Here's a screenshot of CloudFlare Analytics for the last 30 days:
[http://d.pr/i/6Z8S/5GU2Ni8t](http://d.pr/i/6Z8S/5GU2Ni8t)

~~~
ghshephard
Thanks - that's eye opening.

So, what this really comes down to (after a good nights sleep) - is what type
of traffic/transactions are you running on your back end infrastructure.

If the data is static, then you can probably (these days) cut your costs for
25 Terabytes/month from $8K to $800 (or, in your extraordinary case, $80),
simply by being a bit intelligent as to how you make use of VPS/CDN/CloudFlare
Transfer allocations.

On the flip side, if much of the data you are transferring out is the result
of dynamic back end transactions, queries, and generation, then it's unclear
to me that you can (easily) recognize the savings that you might see when
generating static content.

I'm interested in knowing if CloudFlare will start throttling/shutting down
people who pay $20 and use 25 TBytes in the long term though - that alone, for
some organizations, will cost them more than the extra $8K they would pay to
AWS (who, have zero problem with you using 25TB, 250TB, 2.5PB, etc...)

~~~
Zarel
Yeah, I'll admit that other CloudFlare customers are likely subsidizing the
amount of bandwidth I'm using.

Funny thing - back when I was using 10 TB/mo, my site was hosted entirely on
DreamHost's $9/mo shared hosting. I moved mostly because I was starting to get
several hours a month of downtime - presumably, they were gently nudging me
off their service.

I've seen plenty of $60-$100 dedicated servers come with unlimited-use 100Mbit
connections, which work out to 16ish TB/mo before you start getting to 50%
saturation. Of course, those are still subsidized in that that pricing is
possible only because most people who buy it don't max out a 100Mbit
connection.

Still, though, S3's 9-12¢/GB bandwidth pricing seems a bit high. Bandwidth at
DigitalOcean (presumably unsubsidized) is 2¢/GB, which comes out to a much
more manageable $500 for 25 TB.

With dynamic content, CloudFlare has Railgun, which takes advantage of the
fact that dynamic content is usually mostly static. Still, though, if you have
25 TB of dynamic content, I presume bandwidth stops becoming the limiting
factor in your cost of operation.

------
jpfuentes2
There's already 30+ comments on this thread and no one has pointed out the
obvious: this is all for the peanut gallery to laugh at Npm, Inc.

If the bill remained relatively consistent they could host Rubygems.org for
~28 months with 200K.

------
hurrycane
We run into the same cost-related problems for our CDN. What we did to solve
it was to rent dedicated servers that are near AWS regions. We used Route53
latency based routing to route traffic to that dedicated servers + Nginx +
LUA. We're serving 300+ TB of traffic per month and the total price is just a
percent of the RubyGems AWS Bill. There is some maintenance included with this
solution and the problem is finding the right dedicated server providers.

------
reustle
That's not as bad as I was expecting. I was once working with a startups
infrastructure (>100 servers) and it was near 20k/mo (mostly reserved
instances)

~~~
jayvanguard
Yes, this seems quite reasonable considering the scale it handles.

------
jtrtoo
Since it can take a bit of time to read through the invoice, here's a summary
of the bill:

CloudFront $1,071 Data Transfer $3,597 EC2 $2,184 S3 $ 228

While "bandwidth" costs equate to ~$4,668/month, only $1,071 is CDN
(CloudFront), with the balance just raw Data Transfer.

Since lots of folks are commenting, and not everyone realizes the difference
it's also a good time to point out the CloudFront vs. Data Transfer
distinction.

Using Amazon's terms... Data Transfer means anything directly served/coming
from EC2 or S3 (or a few other services which aren't relevant here), but NOT
anything for CloudFront (which is, obviously, a separate line item, as shown
above).

The bulk of CDN (CloudFront) usage ($735 worth or 69%) is US.

The bulk of Data raw bandwidth (Data Transfer) usage ($2,931 ~80%) is US East.

~~~
jtrtoo
Is any of this good/bad/right/wrong? I have no idea. That depends quite a bit
on what THEY are doing with it and why. For example, it can be cheaper to
distribute from CloudFront versus straight from S3 for some use cases. Though,
generally, you are not only looking at using CloudFront to save money over S3
...there's typically a performance reason.

And sometimes the hosting costs simply don't matter. It's easy for us
engineers - siting here on HN - to sit at our keyboards and play around with
hypothetical ways to save money. This isn't necessarily a bad thing, but there
are numerous things in IT that it doesn't make sense to optimize. Why? Because
the ROI on the engineering time, CapEx, and OpEx (and the time, energy, and
focus of ANYONE involved or impacted at all) to do the optimization doesn't
outweigh the opportunity cost.

Sometimes there are simply better uses of our limited capital and time.

Not everything needs to be optimized. And the argument gets stronger when
there are other factors more difficult to factor in: adopting a platform that
isn't as widely known or isn't backed by a similar level of maturity (even
with it's quirks, at least they are well known), etc.

The risks/concerns not only vary between organizations, but often from one
period of an organization's growth to the next. The beauty is every
organization gets to make their own decision ...and none of them have to give
a damn if the HN community agrees or not. :-)

------
SeoxyS
While by no means insignificant, this bill is no where near what I'd imagine
would warrant a HN post. I wouldn't be surprised if most startups beat this
regularly.

The startup whose backend I co-created racks up an AWS bill that hovers around
a half million dollars a month. We make use of all of the ways to save with
Amazon: pre-paid reserved instances, negotiated deals, etc. And we're not even
that big; imagine what Netflix's AWS bill must cost?

We've tried other providers, toyed with co-locating, but at the end of the day
the flexibility and cost benefit of IaaS outweighed the lower base price of
CPU cycles when you roll it yourself.

~~~
twotwotwo
> this bill is no where near what I'd imagine would warrant a HN post.

Can only guess at why folks like any post, but it's not necessarily how large
the bill is. Maybe it's how low it is for a service that's widely relied on,
or maybe it's the level of transparency, which turned out to include evanphx
above showing up to answer questions about the project.

~~~
matthewrudy
Absolutely, this is a transparency thing.

Compared to npm asking for $300,000 in donations to keep the thing running.
I'm glad RubyGems can run for relatively so little, and be transparent in
doing so.

------
ne0lithic
With most of this being bandwidth costs, it seems like switching to a host
like Digital Ocean would make more sense here. The bandwidth costs are a
fraction of Amazon's in comparison.

As for the CDN, switching to something like Cloudflare might make more sense
rather than relying on Cloudfront. At the least, there's a "US and EU only"
option for edge locations to use which si considerably cheaper than the
default option of all edge locations.

~~~
mje__
Wow, as someone who uses rubygems all day and is not in "US and EU only", I'm
glad you're not involved in this project.

------
sampierson
Why was this even posted? Looking for help reducing it? Complaining about the
amount spent? Looking for a pat on the back?

I saw a talk at Ruby/RailsConf about the work spent building and maintaining
rubygems.org. It smelled a bit martyrish. "Look at the thankless work we
perform behind the scenes".

Well, if help is required building or operating rubygems.org, please just say
so. As a seasoned Ruby developer I'd be more than happy to contribute
development time, and as a daily user I'd be willing to commit financially in
a small way towards operating costs. Not that that is required - given all the
offers of free hosting this post received in response.

If we don't know about a problem, we can't help. Just ask if help is what you
want. It's not like the Ruby community doesn't have great communication
channels.

------
reillyse
This seems reasonable to me? Why is this a newsworthy item?

~~~
pataphysician
Transparency is nice and the guy paying the bill answered questions for
curious minds.

------
rebyn
How can I donate to Ruby Central? Checked out their "Support" page yet it
didn't help much. Any easier ways like donating via Paypal?

------
tyw
could easily knock multiple thousand bucks off of that by just reserving the
ec2 servers you know you'll need, plus reserve the cloudfront bandwidth you
know you'll need (for the amount of data served I believe you should be able
to cut CF costs by at least half).

3 year heavy EC2 reservations pay for themselves in ~7 months, cloudfront
reserved bandwidth is just a 12 month agreement so that costs nothing up
front. You might want to experiment with some different instance types though,
depending on your resource utilization. Personally I really like using the new
c3.large instances for my web servers and anything else that needs more CPU
than memory, proportionately. If the standard instances suit your needs better
you still might want to move to the m3 class.

Aside from those two items it looks like you are sending out a considerable
amount of stuff from EC2->internet (27 TB transfer out from US-East to
internet). I'd recommend looking at whether you could set up a cloudfront
distribution with your EC2 servers as its origin.

------
colinbartlett
I had no idea it cost this much to host rubygems.org.

The website says that hosting is provided by BlueBox?

~~~
dschwartz88
This seems to be mostly their CDN bill. Not sure, but I don't really consider
a CDN as part of hosting fees, more of a general infrastructure fee.

~~~
davidradcliffe
This includes all our compute fees too.

------
lassebunk
Could it be possible to cache the version list locally and then just update it
incrementally, e.g. via Git? Wouldn't this save both download time (for us),
and bandwidth (for RubyGems)?

------
ww520
Interesting. Looks like most are bandwidth cost.

------
senthilnayagam
assuming there is a direct correlation between requests and projects, we can
do a guestimate on ruby developers and projects which are active

------
soheil
That's it?

------
dushyant
WHY DOWNLOAD!!! WHY

