Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
AWS Customers Rack Up Hefty Bills for Moving Data (theinformation.com)
421 points by ballmers_peak on Oct 21, 2019 | hide | past | favorite | 238 comments


I recently started running more Google Cloud VMs in the UK instead of the US. It was costing me £2 a day to run the VMs, but somehow I also paid £12 a day for "GCP Storage egress between NA and EU”.

Turns out that by default Google's Docker container registry only stores the Docker images in the US. So each time I launched a VM the Docker image was downloaded from the US. I wrote more about it here: https://www.mattzeunert.com/2019/10/13/reducing-docker-image...

The billing interface didn't show that the Cloud Storage cost was related to the Docker images. I was investigating my normal Cloud Storage use, but it didn't explain why I was being charged so much. Only after a few days did I get the idea that it might be the Docker images that were causing it.


actually there is a eu mirror for the gcp registry eu.gcr.io


That seems like the kind of thing that should be configured by default on servers in the EU.


It's not really clear to me what that would mean. Auto-replicating customer data like docker images across national borders is a non-starter for legal compliance reasons. That's got to be an opt-in, not an opt-out. So they can't just point outbound GCR requests at the EU registry and assume the customer's images are there.


It's in their docs


Their docs say it’s configured by default on EU servers?


You say what image to use. There is no default.


It's not a mirror, it's just another place to put your images. If you're eu based then it's definitely better to use eu.gcr.io for everything from the start.


Yep, I started copying my images to the EU and Asia regions and now it's fine cost-wise. They don't have an Australia region for the container registry though, so launching VMs there is still pricy for me.


Is there a plausible explanation why egress fees from cloud providers costs around $0.1/GB? "Traditional" server providers such as Hetzner are able to offer bandwidth at orders of magnitude lower price (eg. $1.1/TB). I understand that cloud providers may have better interconnects or better uptimes, but that doesn't justify the magnitudes higher pricing.


Disclaimer: I worked on an AWS service team.

This is, oddly enough, similar to a debate people have about consumers TV or Internet: should pricing be "unlimited" or "a la carte"?

AWS is combining all your networking charges into one lump "outgoing data transfer" fee. So it's heavily marked up in comparison to what they're paying for the outgoing data transfer, and you're not sure how much is profit vs. whether it's going to cover all their other costs.

So it might be fairer if AWS broke out separate line items for internal, incoming and outgoing data transfer, plus all the additional systems a customer uses.

I think AWS's billing is probably already on the falling side of diminishing marginal returns. That is, it's complex enough that more information would tend to hinder customers from getting the best price. Right now, if I plan to reduce my data charges, I have one variable to tinker with. If we expand this, it would mean I'm having to balance incoming / internal and outgoing charges. That sounds simple, but in terms of engineering it can be very complex.

The next claim is that this biases customers not to move. Of course, Azure and GCP have the same arrangement, so while you pay to move out of AWS, you don't pay to move in to Azure or GCP. So all the vendors are attempting to lock you in to their product, and at the same time trying to extricate you from their competitors, overall it's a wash.

So, yes, part of the motivation for egress charges is that ingress is a loss leader. But it's also true that egress is a metric that does, for the vast majority of their customers, directly translate into customer value. If there's a compelling case for doing it differently, someone should do it and see if it works.


> If there's a compelling case for doing it differently, someone should do it and see if it works.

Cloudflare doesn't charge for bandwidth. I always throw cloudflare on top of anything I do, not because I really need a CDN or anything, but because the bandwidth cost would bankrupt me otherwise. The ceo of cloudflare gave the rationale on why they don't charge:

> There’s a fixed cost of setting up those peering arrangements, but, once in place, there’s no incremental cost. That’s why we have similar agreements to Backblaze in place with Google, Microsoft, IBM, Digital Ocean, etc. It’s pretty shameful, actually, that AWS has so far refused. When using Cloudflare, they don’t pay for the bandwidth, and we don’t pay for the Bandwidth, so why are customers paying for the bandwidth. Amazon pretends to be customer-focused. This is a clear example where they’re not.

https://news.ycombinator.com/item?id=20791563


According to Cloudflare, they do not have any bandwidth pricing arrangement with Microsoft for Azure users.

They also do charge for Enterprise plans, but instead of transparent pricing I got high-pressure sales techniques and black box pricing offers - which then anchored our rate so that as we grow past our current contract, we're forced to upgrade at any point with pricing based solely on our original negotiation.

Frankly, while I save money using Cloudflare over Azure's CDN right now, it's left a very sour taste in my mouth and I'll be jumping their ship as soon as I have time to find a suitable alternative.


> high-pressure sales techniques and black box pricing offers - which then anchored our rate

If you have the ability to shift your entire enterprise CDN away from them, why not first try renegotiating?



Cloudflare most certainly disables zones on the free plan that use excess bandwidth. Enterprise contracts are also negotiated based on transit and those prices mirror comparable CDN services.


All that tells us is that cloudflare has a different revenue stream. Amazon is a business and they are in the business of making money. If they weren't charging for egress bandwidth they'd just charge for something else.


"I think AWS's billing is probably already on the falling side of diminishing marginal returns. That is, it's complex enough that more information would tend to hinder customers from getting the best price. Right now, if I plan to reduce my data charges, I have one variable to tinker with."

No, it's two variables - the egress charges you refer to and the actual cost to store the data.

We[1] have found that it is, as you might expect, quite a bit simpler to charge for just the storage and forget about metering the usage/bandwidth/transfer.

So we have typically had our price point higher than the B2s or Wasabis of the world, but there's just one simple number to think about - and no potential for surprises in the billing.

I will admit to having a bit of concern over adding 'rclone'[2] to our platform and the potential for users to just burn bandwidth using an rsync.net account as a "transfer host" but that is why we peer with he.net and their cheap an plentiful 10gb pipes.

[1] rsync.net

[2] ssh user@rsync.net rclone s3:/bucket gdrive:/blah/blah


And how many PoPs regional interconnects, highly availabile, high throughout connections, cross continent highly available connections do you have? Do you detect failure across these connections? Do you detect grey failures? Do you have a team of infrastructure engineers to look after this network?


We keep all of those to an absolute minimum and avoid as much complexity in our infrastructure as possible.

Which is to say, each of our five[1] regional POPs have a single connection provided through a dumb switch one hop from he.net[2].

They have no interconnection or dependencies to one another.

No routers, no firewalls, no balancing, no failover. When rsync.net fails, it's a very, very boring failure.

We've had zero network outages in the last 60 months or so.

[1] Fremont, San Diego, Denver, Zurich, Hong Kong

[2] init7 in Zurich ...


> So it might be fairer if AWS broke out separate line items for internal, incoming and outgoing data transfer, plus all the additional systems a customer uses.

The example given above for comparison, Hetzner, also doesn't charge for inbound and internal transfer AFAIK. Nor "the additional systems a customer uses". You pay a charge for the server, you get some amount of traffic included, and if you go over, the additional traffic costs something like $1.1/TB. That's all you pay.


> AWS is combining all your networking charges into one lump "outgoing data transfer" fee

> So it might be fairer if AWS broke out separate line items for internal, incoming and outgoing data transfer

This explanation doesn't cut it for me - most (all?) "traditional" VPS providers don't charge for ingress traffic, and I doubt anyone, ever, has charged for internal traffic.

So what exactly is 'all the networking charges' comprised of, other than egress data?


AWS is vast. I have no idea what their overall accounting for networking looks like. Even for the tiny service I worked on, it would be tough to guess at what are overall costs were. We actually had an internal bill each month for all the regular AWS services we used, but then there were a host of internal services we depended on.

That companies don't charge for specific things doesn't mean those things don't cost them anything. It just means they're trying to work out a pricing scheme that scales with customer usage and is broadly understandable. So "data egress" is really just a proxy for "how much stuff you're doing with the networking subsystems of AWS."

Same thing with EC2, there are a whole pile of costs that are summed up with "time you rented an instance."


See a lower comment I made here; what I really want is a little transparency about pricing.

Of course there are is an internal cost of doing business, and peripheral infrastructure cost - but if I pay $100 for service "A" I reasonably expect that fee pays for service "A". Instead, egress bandwidth costs seem to be used to trick customers into thinking services are cheaper than they really are.


How many of these VPS providers are actually managing global highly availabile network infrastructure?


I would have presumed that the infrastructure cost for each service was included in the cost for each service.

For egress bandwidth costs, I'd assume it included, well, the egress bandwidth cost.


I guess there aren't that many global network providers - I'm not even sure how much fiber Amazon owns in Japan, Australia or Northern Europe for that matter.

But I think level3 is associated with:

https://www.centurylink.com/business/hybrid-it-cloud/public-...

And while they have a call-us price list (if you have to ask...) - they at least state:

"Public and Private high-capacity networking options up to 10Gbps. Note: there is no charge for internal data center traffic. Cost on a per-GB-out model"

I have no idea what they charge pr gb for this cloud product however.


When you do outbound networking, you're sending it out to the internet, not mangling it within AWS's network


I get your point, but then why does AWS charge for inter-az traffic? That seems like an "egress but not really" kinda thing. If AWS/GCP stopped charging for this, customers would be incentivized to build HA systems and distribute their workloads across AZ's, which are a win for both customers and you (since capacity is now spread instead of stuck in a zone).


The doctrine for HA is that each AZ should be fully independent, and if you do that, your inter-AZ traffic is relatively minimal.

And I think the charges for inter-AZ transfer are to incentivize customers to do that.

Of course, to make them fully independent, you have to replicate everything, so you wind up buying several redundant copies of your system...


> Of course, to make them fully independent, you have to replicate everything, so you wind up buying several redundant copies of your system...

Yeah, and keeping around warm systems ready to failover in case of a zonal outage seems like a preposterous waste of resources.

The alternative... to keep around multiple replicas of your system in different zones, all ready to accept traffic and which do serve traffic, seems more practical and less wasteful.


This. Paying for n+2 capacity is really expensive when n=1, but pretty reasonable at n=5 or n=10. Until someone gouges you on data transfer …


It's not a waste when it serves a purpose. Availability is a big concern


Sure, from the perspective of availability, it makes sense to keep this around. The US military has redundancies in place to handle many kinds of adverse scenarios, which comes at a price, but is justifiable. The point I'm trying to make is that if availability is the _only_ value, it becomes hard to justify that if you're a scrappy for-profit corp looking at your bottom line.

If instead of availability being the only value, there would be a more value provided from actually using such resources, more folks would adopt cross-AZ architectures which would be a win-win for both the customer (get HA for lower or no cost and go down less often and succeed in the market) and thus the cloud provider (keep raking in the steady cloud revenue as the customer grows).


> so you wind up buying several redundant copies of your system

This is one of those gotcha's that company's hit. They see the public pricing page and think "wow that is much cheaper than one my internal IT department charges for X", and then when they go to actually implement they find that "best practice" says they basically have to more than double or even triple the cost to get a reliable system (more because not only do you have to duplicate all the infrastructure into a second AZ, you are getting charged for the replication traffic between them).


But in all fairness, if you actually implement that ”best practice” HA infrastructure, you will also be miles ahead of almost all internal IT departments.


Inter-AZ traffic is Metro Area Network ("MAN") traffic. There is a cost for running network between locations.

This explains the cost.

Price probably should be based on value, not on cost. Why do they charge for it? Because they decided it's a good way to make money and profit.


Something has to pay for the fat pipe between colos ( AZ's)


But those are probably far below $1/TB. Otherwise other colocation providers couldn't offer that even with peering.


The bandwidth alliance has a response to this approach: https://www.theregister.co.uk/2018/09/26/cloudflare_bandwidt...


Outbound bandwidth also happens to be an excellent place to put any markup you can, as that also locks people from migrating away so easily


Because AWS is designed to be like a pitcher plant, shaped like a funnel with inward-facing hairs to make it easy to get in, difficult to get out.

That's pretty much it - in my experience most of AWS' awesome features are designed to lock customers into an environment where Amazon increasingly provides all of the components and services you need to do business.

I would assume the endgame is to create an ecosystem where the vast majority of customers allow their IT function (infrastructure, developing software, engaging third-party SaaS vendors, etc) to atrophy entirely, after which they'll have no choice but to buy what Amazon is selling, at any price, in perpetuity.


It has no basis in reality. You can buy a 10Gbps transit circuit for $1500/month. That's $0.00046296296 per gigabyte.


That depends on how much saturation you can get. Most end users/companies can’t saturate this reliably, or have much spoiler traffic. Only the biggest players can minimise their seasonality enough to get close to this, and doing that is a service that it’s probably worth paying for.


Who cares if you can saturate it? Breakeven is 18 terabytes/month, or 0.6% utilization on that line.

You could do this even just pegging the line 9 minutes per day and otherwise leaving it unused. ;)


+1 for this.

People forget just how affordable it can be to maintain your own infrastructure. You can have the hardware and network capable of supporting 10X your average traffic loads and still have it operate far more cost effectively than the equivalent traffic on AWS.

At my work, we slashed our overall hosting costs by moving a data warehouse off of AWS and on to our own self-maintained infrastructure.

But with a bloated inefficient IT department or non-savvy negotiations with hardware vendors or transit providers, it can also be more expensive than AWS.


I have had this debate with so many people, AWS is not cheaper than your own hosting. Period.

What you get from AWS is the logistics pipeline is already built as is the infrastructure should you suddenly require to serve factors of traffic more.

The ROI of AWS comes from the backend and capex vs opex debates.

Example : The CFO can go to the board and explain we are getting ready to reduce opex by laying off 10 developers at 150k yr during any meeting. The capex cost is usually fixed and hard to explain away.


I think the best cases for AWS is counter to marketing, very small companies. Their native offerings do make things great for a while when you're starting out.

The other is stupendous burst activity, like you just need a thousand(s) cores for a couple hours. Of course this doesen't mean the baseload has to be in AWS, just easy for small teams.


Agreed.


I hear more and more cases of businesses moving parts off of the cloud, back on-premises. I think, in a wider perspective, it's going to play back and forth. The 2010's were 'forth' towards the cloud, the 2020's might see a peak back on-prems, etc.

It's generally a function of network cost versus infra cost, and so the 'equation' solves differently based on the longest tech cycles, from inception to maturity to skill pool to diminishing returns and then back again on some other mode — this really is a decade+ thing.

Some "always true" inherent advantages of one approach (e.g. availability for cloud, or resources for on-prems) would remain across cycles as permanent gains; disruption then occurs when some new approach (e.g. containerization) fundamentally upsets the order of costs.


> It's generally a function of network cost versus infra cost, and so the 'equation' solves differently based on the longest tech cycles,

It used to be OpEx was easy, CapEx was hard to get approved. As people took advantage and OpEx went through the roof people are getting alot more pushback on reducing OpEx.


Spoken like a true Chief Officer.

Pushing back on OpEx in favor of CapEx might be seen as a more long-term strategy too. Basically rent vs purchase, even for deprecating assets like infosys, as I think the current DevOps trend (scaling in pure software, virtualization, etc) makes it easier than ever to squeeze every last FLOP on-premises.


> Only the biggest players can minimise their seasonality enough to get close to this

Okay, like AWS

> and doing that is a service that it’s probably worth paying for.

Okay, but how many orders of magnitude?


Well, reverse it: Amazon is about $1K per 10TB.

That's an expensive disk drive.


Thinking of it as a disk drive is fallacious


Because this is what the market agrees to bear.

They want you to keep all of your data in their cloud, do all your processing there using their services (because doing it elsewhere incurs expensive egress fees), and get paid handsomely for your need to actually serve the data to your customers.

This also makes a migration additionally expensive, because you would need to egress all your data (old logs, some fresh backups, etc) in a short while.

So it's basically a soft lock-in.


> Because this is what the market agrees to bear.

I think it's more than that because a lot of folks aren't aware of the costs until they need to or want to move and then they get hit with a massive bill.

So maybe they chalk it up to experience and pay the bill because they have no other choice or calculated it's still better in the long term.

To me that's a much difference scenario than walking into a store and happily buying a dozen eggs for anywhere between $1 and $1.50. In this case you know what you're getting into before you make the purchase and everyone around you (other customers and businesses) decided that's what eggs will sell for in the open market. With outgoing data fees, it's more like a "take it or leave it" price dictated by the provider while they already have your data and there's no price competition since they are the sole business with your data.

Whether or not it's the customer's fault for not doing enough research is debatable, but it certainly doesn't help that most providers make it pretty difficult to calculate costs.


That's still the market.

The market isn't exclusively good, and has many failure modes. This is one of them.


While true, this dilutes the value of the statement. At that rate every price point “is the market”, and nothing of further interest can ever be said about counter intuitive pricing.

Sure, it’s the market. Why? Is it a Giffen good? Is it a case of very poor visibility of services to consumers? ...? There is something interesting going on, let’s figure it out.


There are some bandaid-solutions people have used to migrate away from this without taking a major hit. One is utilizing the free outbound bandwidth from lightsail, and bigger transfers can use snowball which is 'only' $0,03/GB outbound + shipping


A major issue is that in most markets outside of Europe, peering is less common, and transit prices are significantly higher.

Europe is probably the most competitive and best market in this case, almost all other markets are dominated by monopolies. Yes, even significantly worse than the Telekom monopoly.

Look at transit in Singapore, Japan or Australia. Or even in Brasil. You’ll go bankrupt even from trying to deliver a single movie to customers.


Actually Singapore and Australia aren't that terrible anymore.

Hurricane Electric has POPs in both Singapore and Australia. Transport between Singapore and Australia is $1.50 or less. Peering ports are $0.20 per Mbps.

If all you want to do is push movies at customers, there are plenty of dedicated server providers who will sell you bandwidth on the cheap in both countries.

Obviously YMMV if you want better routes or direct interconnects with local monopolies.


Well that's exactly the issue, HE often doesn't have peering with local monopolies, or at best only a 10G link, because those monopolies don't even care.


Thats one of the big money makers plus it locks you in so there is 0 chance they will change it.


> Is there a plausible explanation why egress fees from cloud providers costs around $0.1/GB?

Isn't DigitalOcean a cloud provider ($0.01/GB)?. Isn't Oracle a cloud provider (first 10 TB free, $ 0.0085 after)? Isn't OVH a cloud provider (free bandwidth)?


Unless I'm making a mistake someplace, Digital Ocean includes 1TB egress to the internet from a $5-per-month host. On Google Cloud Platform that egress would cost $120/month by itself.

https://www.digitalocean.com/pricing/#Compute

https://cloud.google.com/compute/network-pricing


What you find is if you spin up 1000 of these $5 hosts (ie, $5000 per month which is pretty) and try to egress 1PB per month you can't. Either they disconnect you, the systems can't actually handle it or some other gimmick.

One thing folks like about AWS - you can actually know what you will be paying and there is no fake / hidden limits.

That's my somewhat outdated experience wasting a TON of time on this idea ages ago.


But going back to the original number of $0.01/GB, that's what they state upfront as a bandwidth overage cost. I would be very surprised if they disliked customers paying the overage fee. And that's still only $10k per petabyte, vastly cheaper than AWS.

Give them a warning if you're going to spike your bill that high, but there shouldn't be any fake/hidden limits.


Which means higher pricing at GCP/AWS/Azure can't be explained by higher availability, DO offers the same.


When your link is bargain basement C* "good luck" peering, it can be super cheap and honestly will likely get everything most users need.

When you care about packet level SLOs... yeah, you start to shop around.


But how many applications on AWS really need that? It's probably a fraction of bandwidth that'd need such a SLO.


I was just reading about advisor termination fee terms which were some substantial multiple of all fees incurred over some past number of years.

People under pressure make short term optimizations at the expense of long term strategy. There's nothing more to it. AWS reduced compute and IT expenses in the next few fiscal years, so people jumped all in on it. Solve the problem now. Someone will fix it in the future.


Programmers should campaign for hefty termination fees. Sure, you can fire me, but you'd have to pay me one year's salary.

Actually, it would probably just make some programmers even more intolerable.


Instead of options ask for severance term next job interview


I doubt that you'll find you can consistently push or pull 1Gbps from a Hetzner server. I certainly can't. --That's not to say that they're a bad deal, but their total bandwidth is nowhere near what the major cloud providers can give you.


Cannot confirm this claim.

I've always got full 1Gbps out of Hetzner, even across the ocean, and for periods of multiple days of transfers.

(Like on all machines, you need to set the right TCP settings for windows sizes to make it physically possible.)


Why can't you? I'm doing this regularly (backups & restore). I get reliably 1gbit/s from other providers, also from US (with multiple connections of course). Haven't come across a provider where speed would be significantly below that.


Also Hetzner offers hosting that works sometimes/ maybe. I only consider it a viable option in the earliest days of starting a project.


I have few dedicated and few vps there, this year I had less than 5 minutes of downtime on 1 of the vps and 100 % uptime for all the rest, do you have more details about your experience?


I've actually been working on a library to help mitigate cloud storage lock-in. The idea is to treat cloud storage providers like disks are treated in RAID. For example, you have 3 separate cloud providers. Cloud providers 1 and 2 have every other byte of data striped across them. Cloud provider 3 has parity data. To pull a file you can only need 2 of the 3 cloud providers. If you don't like how a cloud storage provider is treating you or charging you just pull from the other 2 providers and use them as a backup in case one goes down. You can also just remove them entirely from the equation, but then you have no redundancy if one of the others goes down. It gives you a lot of negotiating power to lower egress costs because you can just pull them out of the equation at any time and reinstate them once you get better pricing.


> every other byte of data striped across

I assume you didn't mean that literally because I can't see how that will ever work out in terms of cpu cost. I think breaking it up into blocks like what RAID4/5/6 would be better but will still impact the performance of reads.

The performance of writes is going to be worse. Not because of the parity calculation but because you will be taking the max latency over all the cloud providers.

I can't see people trading off that much performance for better fault tolerance (in a world where S3 guarantees 11 nines) or ease of switching.


Yeah, RAID 4/5/6 are planned for the future. The plan is to offer all of them and let developers choose what is the best practice for their application. RAID 0/2/3 are not CPU efficient, but are great for privacy and security. No cloud provider has the full picture and can't spy on your data and if they have a data leak it won't be anything useful. RAID 1 gives great fault tolerance with no extra latency (except on failures) and prevents vendor lock-in.


Pretty sure paying 3 cloud providers is more expensive than 1 cloud provider


It's actually cheaper when you want global redundant storage. Cloud providers often charge twice as much for global redundant data. RAID 2,3,4 offer global redundancy, but only take up 1.5 times us much space. Instead of paying twice as much you only end up paying 1.5 times as much because you can get away with locally redundant pricing. If you're large enough you'll actually save money by having more negotiating power since you can walk away from a provider at any given time.


Sounds interesting. Link?

I'd probably use commodity VMs for this rather than big clouds if it is indeed resilient.


Are you doing this to replace a CDN? There are already 3rd party CDNs like CloudFlare.

If you are doing it as a replacement for traffic within an AWS region and availability zone, it seems like you will be both more expensive and have much higher latency.

Or is the application something else entirely?


It's something else entirely. It's a mixed cloud approach combining the storage offerings of Azure, Google Cloud, and S3 providers. The idea is not to trust one cloud provider to provide fair pricing and proper redundancy. Right now I'm mirroring RAID 0,1, and 3. Applying RAID 3 to the cloud is going to give you higher latency and more processor and memory usage because the file has to be reassembled on the client machine. However, if you apply RAID 1 to the cloud your latency is similar because each cloud provider has the full file. In the case of RAID 1 the library will upload a full copy to each cloud provider and will download files by trying providers until one succeeds. If you only use two providers your pricing is usually the same because geo-redundancy in storage providers is often twice the cost and you're getting geo-redundancy built in by having multiple providers in different regions. RAID 3 is actually cheaper because you have geo-redundancy, but you're only storing 1.5 times as much data.


Sounds a little like gluster?


Sounds exactly what is Tahoe-LAFS for.


Yeah it's pretty similar. However, I'm focusing entirely on the cloud, keeping the package lightweight, and giving the consuming application decisions on how to store the data based off their needs.


Currently use this across cloud providers.


"The charges don’t appear to be a case of cloud providers gouging their customers"

I disagree on this one. The margins on egress are, well...egregious.


If I'm not mistaken AWS has the highest egress rates of the major cloud providers.


All of them (AWS, GCP, Azure) are priced outrageously compared to high quality, well peered, bandwidth.


GCP is pretty good though. Cold-potato, very fat backbone, and very good presence at a ton of PoPs. When using GCP you basically have the same global, high-bandwidth direct connectivity presence that Google uses for its products, and that is very difficult to match by traditional T1 ISPs.


Ok, help me out: "cold-potato"? :D


Vs "hot potato"

Hold onto the packet for as long as you can vs hand it off to your peer as quickly as possible.

Most networks do "hot", Google does "cold" since their network is almost always better than that of the peer.


The origin of which, for those who aren't familiar, is a game called "hot potato" where you try to pass a ball around as quickly as possible as if it was a hot potato


Weird terminology. Thanks for explaining though


It is the inverse of “hot potato” routing where the network tries to get rid of a packet as soon as possible (that is, drop a hot potato). Cold potato means the network keeps the packet on-network as long as it can.


Cold potato is not necessarily better, actually its often times worse than hot potato and usually used to lower costs so you don't have to pay other people for transit.

For example, a cold potato network may have a link from Dallas to Chicago to New York, while a hot potato network could have a direct link from Dallas to New York.

Cogent uses cold potato and is frequently worse than other transit providers.


Google's cold potato is very good though.

(Also, they offer the option to use hot potato and pay them less: https://cloud.google.com/network-tiers/docs/overview )


Thanks, you made my day.

I'm a Google network SRE, but perhaps I'll get new business cards saying "cold potato engineer".


So it's hard to value the premium for google cold potato specifically, if it outclasses everything else.

But their hot potato still costs $65+ per TB at medium volumes and $45+ per TB at high volumes. That is still extremely high compared to normal peering costs.


It does have advantages, but it's not always worth the difference of "several cents per GB" versus "fractions of a cent per GB".


Walmart is generally cheaper than a steak restaurant. YMMV, and they both have their uses. Its worth going to both.


That's a very odd analogy, and overstating the difference.


My company was getting massive bills on AWS S3 egress. It was one of the reasons we moved to Wasabi for bucket storage; we then had to deal with a huge one time hit for egress, but in the long run the short term cost was worth it.


Do you think Wasabi is sustainable? I looked at them but find it hard to believe they can sustain free unlimited egress forever.


I like the Cloudflare/Backblaze duo. Presumably data stays on Cloudflare's network for as long as possible and goes to Backblaze via direct links, so Backblaze can provide free egress to Cloudflare customers (and customers of other serivces like Packet, etc.), while charging others.

This seems more sustainable than Wasabi's model, but there's no way of knowing for sure.


I just put 1 TB in AWS/Azure/GCP's cost calculators.

AWS (US East 1, no free tier) - $92.07

Azure (East US) - $88.65

GCP (Americals) - $122.88

I'm quite surprised by GCP being the highest cost here, and by such a wide margin.


Backblaze has good rates, but it's only really good for storage...which is their main use. I am also surprised GCS (GCP?) is so high. Last a checked AWS was the highest, maybe they cut their rates. I am a pretty happy Azure customer. My only complaint really are their service plans being subpar, but now I've switched over to their IaaS model and it's much better.


The point is to make it easy to move data in and hard to move it out.


Yup. It is like a gravity well. Escape velocity is your egress data charges ;)


Doctor Who: World Enough and Time


Someone should set up a big fat pipe right outside of Amazon data centers with free unlimited transfers, get data on behalf of customers copied to hard drives from AWS, and then attach those hard drives to the Uber pipe. I bet that service could work for a short & glorious moment in time.


Amazon beat you to it: https://aws.amazon.com/snowball/


Snowball still charges ~3 cents/GB to get it out of S3 and into Snowball.



Keeping track of all the different offerings AWS has is a full time job.


Someone does it for you https://www.lastweekinaws.com/

I am completely unaffiliated with this site, I just enjoy it


Cloud architect is a title as a result.


It literally is.

I've been a developer for 10 years and AWS seems to me like its intentionally designed to be as messy as possible.


Totally agree. AWS should learn something from DO, instead of racking up services. Inconsistent UI completely sucks. It makes very difficult to track resources and billing. Resources from different regions need to be managed separately. Billing is too much complicated. I always feel AWS is hype. Very few applications need scaling like Netflix. But as it is now industry standard, people are adopting it blindly without considering usecases.


Fun fact is, that Netflix only hosts their website on AWS, the video stream is delivered from their very own CDN which is completely self-engineered by them and multiple magnitudes cheaper than AWS: https://openconnect.netflix.com/


no prices. don't underestimate the bandwidth of a fully loaded tractor trailer, 21st century style


I'm disappointed that one doesn't appear to be available via API.


That would still require egress.


How about just sending Freedom Of Information Act request to the NSA asking them for a copy of your data. I think this is a joke...


Government agencies can (and often do) charge FOIA processing fees. Not that this has anything to do with cloud egress charges though.


I was going to say it’d be an improvement but I can’t actually make sense of the AWS Snowball proving

https://aws.amazon.com/snowball/pricing/


The pricing seems relatively clear:

A flat fee for the act of a human getting the data onto the physical medium ($200) + the cost of shipping (<$100?) + $15 per day you keep the snowball device past the first + price per GB of data you're transferring ($0.03 per/GB).

So if your getting out 30 TB of data that's $200 + ~$100 + ($0.03 * 30000) = ~$1200


I was confused by the “Standard Amazon S3 storage and request pricing applies.” That seems to be a different thing from the extra S3 transfer costs.


Does Amazon want that, though? There are efforts for other providers to avoid bandwidth cost: https://www.cloudflare.com/bandwidth-alliance/ but Amazon isn't in there


Just like a roach motel!


or Hotel California ...


At some point everyone realized they could charge a massive premium per bit and as long as everyone did it, customers would have to pay. So here we are.


Customers are also to blame, when comparing the costs of two services they tend to look at the cost of an instance hour, or lambda execution and often don't look at transfer costs.

Even if a cloud provider had competitive transfer costs they likely wouldn't attract any new customers and would have less margin left over to subsidize the main cost customers look at, $ per instance hour.

The less attention is paid to transfer costs the better for AWS/GCP/Azure. Why hasn't a spot-market for transfer been introduced? Same reason why I can't sell my unused home internet bandwidth to my neighbors, the money is in controlling the means of transportation/communication and the providers want to keep as tight a control on that as possible.


I wouldn't blame customers when the pricing for data transfer looks like this: https://raw.githubusercontent.com/open-guides/og-aws/master/...

(Source Open Guide to AWS - https://github.com/open-guides/og-aws)


This seems a little deliberately obtuse -- for example, showing two arrows from an EC2 instance to an EC2 instance that exits the VPC. But I generally don't find this too hard to follow? Traffic within an AZ is generally free, but there are some cases where it's not and they generally make sense to me (leaving the VPC, pushing data from your CDN back upstream, etc.)

Then again, I worked for AWS for years, so maybe I'm just used to thinking this way so I'm not really surprised by it.


There was a time when you paid for available bandwidth. Then network operators realized they could oversell their capacity and not spend the money to upgrade their network.

You still see paying for bandwidth with residential connections, though some operators (like Comcast) are trying to do away with it.


>But I generally don't find this too hard to follow?

This is just the static picture though. What's harder to predict are the consequences of some innocuous looking code change.


Sure, but surely teams have monitoring on their usage, right? With automated rollbacks or at least one click manual rollbacks?


Sure, but rolling back work that was already done is a waste of development resources.


Disagree. It's much more wasteful to have an outage. Roll back asap, fix the issue, roll forward, do post mortem, grow as an organization. Never repeat the same mistake.


We're obviously talking past each other.

What I'm saying is that for a hosting architecture to make it difficult to predict the cost of any code change is a downside compared to an architecture that makes such predictions easy and intuitive.

Of course you will try to mitigate any downsides and learn what you can from any mistakes. But unpredictability makes learning far more difficult than it should, which inevitably means a waste of development resources.


I don't think this is true. At $JOB the extent of our cloud cost management is me reading a breakdown by SKU and looking for obvious inefficiencies, and we are very aware of transit fees. I would imagine that anyone in the 5MM+ range has actual models that account for this stuff.


I think this has more to do with collusion than consumer behavior. On average consumers are very rational, even if their rationality is hard to explain.

The issue with per-bit pricing is that a fair agreement for network use would probably look like paying a fee that makes up for the amortization of the network equipment. Anything else is an artificially restricted market created in an attempt to extract more value out of consumers by having them bid against each other.

At some point, yes, we will run out of places to put the switches and routers and then the cost of connectivity will be closer to the cost of land use and will mimic rent, but we are a ways away from that.


Why do you think that bandwidth costs should only cover the hardware? What about the electricity, rent, payroll, sales, marketing, administrative staff, insurance, accountants, lawyers, etc.


Well, by hardware I meant its maintenance as well. Doing so still leads you initially to the sale of bandwidth, not a bidding system.


> Why hasn't a spot-market for transfer been introduced?

Enron tried to create a market for this.

[0] https://www.wired.com/2001/11/enron-a-bandwidth-bloodbath/


I recently saw a talk by a couple of former Google employees who have a business helping cloud customers save money, and they were saying that they see a lot of money being wasted by companies in overprovisioning and neglecting to shut down or delete unused resources like VMs or virtual disks.

Some of their advice for saving money was to keep track of who created each resource and why, so there's less reason to doubt whether an apparently unused resource can be deleted, and to make some limits regarding how many resources can be automatically created (especially in dev environments). Some other ideas were to look for signs of inactivity like low CPU or bandwidth use, and consider deleting such little used resources.

There was much more to the talk, but those were some of the highlights that I can remember without digging out my notes. It was a good talk.


Makes sense. If you're not properly tagging your resources then it can be very hard to track down if/where it's been used, how often, or when it was used last. You can automate/template as much orchestration as you want with stuff like Terraform to bring up well-defined resources, but there will always be outliers without tagging.


> A person close to AWS said its data transfer charges reflect a range of technology costs customers would normally pay if they weren’t using cloud services, including fiber optic connections, networking hardware devices and software, cybersecurity services and network monitoring software.

I'd believe this more if the pricing of bandwidth on AWS hadn't stayed pretty flat since its launch.

Plus, it's frustrating that AWS Lightsail (https://aws.amazon.com/lightsail/pricing/) offers a $3.50/month plan with 1 TB of transfer. That terabyte alone will cost you $92.07 on a normal instance, and the $3.50 includes storage and an instance!


It does appear like price gouging. Once you're in, you're locked in.

Any chance some of those high costs are due to movement of data due to GDPR compliance? Maybe Apple did all its prep in 2017.

Also there are some misconceptions on inter-AZ data transfer as well: (https://www.lastweekinaws.com/blog/aws-cross-az-data-transfe...)


I assure you there is nothing sinister about the price asymmetry. Fiber and routers have as much bandwidth coming as going. The inbound traffic is mostly composed of little http(s) requests. The outbound is full of images and mountains of JavaScript. Cloud providers don't charge for ingress because they got it for free when they grew their egress capability to meet demand.


When was the last time AWS reduced egress rates?

And they could negotiate the best rates on the planet given their scale.


I get the impression cloud providers don't "negotiate rates" but rather "build infrastructure" most of the time.


They don't own many of their data centers. Locations were leaked a while ago, most are colocated in major data centers. They won't dig new trenches and lay cables between those but rather negotiate rates for existing fiber.


Fairly large optimization if you're smaller and a large amount of your data out is cachable is to run a varnish cache on some of the clouds that give you "free" bandwidth.

ie a $20/mo instance on linode gets you 4TB of transfer -- $0.005 per gig. Scale enough of these in various DCs around the world and you have a pretty cheap self-hosted CDN.


a $20/mo old server from hetzner gives you unlimited 1Gbps - thats 324TB /mo


For $1.67/mo, you can get unlimited 1Gbps here:

https://servercheap.net/crm/cart.php?gid=10

Or if you don't trust "unlimited", Hetzner sells CX11 with 20TB/mo for 2.96 eur/mo

https://www.hetzner.com/cloud


Good deal - I do most of my work in the US, I don't know of a similar deal here.


Lots of places that have 1gbps bare metal unmetered boxes for ~$200/mo, too. At 10% utilization that's about $0.005/gig.


any in the US?



e.g. https://www.wholesaleinternet.net/dedicated/

$20/mo for 100TB metered ($0.0002/gigabyte). $120/mo for 200TB metered ($0.0006/gigabyte). $170/mo for 1gbps unmetered ($0.00085/gigabyte at 200TB, $0.00052/gigabyte at full saturation one way).

or https://client.layeronline.com/cart.php?a=confproduct&i=0

$199/mo 1 gbps unmetered.


Plenty. Google it or trawl offers on Webhostingtalk.


Maybe I missed it but where did they get this info? Was it stolen from AWS?

As far as I know, Apple would never share cost data like this, nor would a lot of others on this list. Apple doesn’t even publicly acknowledge that they are customers.


>"The chart, which is based on internal AWS sales figures obtained by The Information,"

I'm not sure that "stolen" is the right word but, yes, it certainly appears based on what The Information wrote that either someone at AWS or a third-party with access to the info (probably not too likely) leaked the confidential numbers. Disgruntled former employee or... Who knows.


I have recently talked with a friend about a very cheap way to egress from AWS, at $2.5 per TB with the AWS lightsail $5 instance which include 2TB of egress per instance. This friend have extracted 200TB of backups for $500 instead of $17,000!


Reading this, I immediately thought: the cost has to go way down with 200TB of volume.

It goes down, but only to $14,131.11!

Nice find. I will file this away if I ever need to do a full remote recovery.


That's against Lightsail TOS, if you care.


To save everyone else a lookup:

> 66.3. You may not use Amazon Lightsail in a manner intended to avoid incurring data fees from other Services (e.g., proxying network traffic from Services to the public Internet or other destinations or excessive data processing through load balancing Services as described in the Documentation), and if you do, we may throttle or suspend your data services or suspend your account.

Source: https://aws.amazon.com/service-terms/


You can do it somewhat cheaper than direct upload with snowballs, at ~$33/TB (S3 + 80TB/$250 shipping), but it's still very pricey


So where is your friend's data in? s3? ebs? I'm curious what is the cost to transfer data from s3/ec2 to Lightsail.


Mostly in S3 but S3/EC2 to lightsail is free within the same region :)


I tried but seems aws has closed the door. To connect ec2 to Lightsail via private IP I have to create a "ec2-classic VPC" in ec2. But aws has disabled "ec2-classic VPC" in all regions


From what I know is that you can peer the Lightsail (AWS-managed) VPC with your normal, default EC2 VPC via the Lightsail console. Then you launch EC2 instances in the default VPC for the transfer


Wow, does it really cost 4-5x more to download a 1 GB file from S3 or Google Cloud Storage than it does to store it for a month? That is mind-blowing.


Bandwidth is expensive, storage is cheap.


cloud providers charge a lot more for bandwidth yes, but in reality bandwidth is much, much cheaper than storage for them to provide.


Amazon Snowball, if I understand it correctly, is a box full of hard drives that they mail you to transfer data. So no bandwidth there.

It still costs $0.03/GB, which is on top of the $200 they charge you to use the service.


Not with smaller cloud providers or colocation centers.


Just curious, how on earth did Kevin and Amir get such incredibly detailed of AWS spending of these accounts.

Either they put it together from public resources, or (more worrysome) someone at AWS leaked them line-item based expenses of their top customers.

This has got to be incredibly sensitive data for AWS, not something they'd want leaked. If I'm (say) AirBnB or Snap, I'd worry that this data leaks information that would be detrimental to their ability to negotiate for cloud computing with Google, Microsoft etc.


The article specifically says its based on internal data.


does anyone know of a cheap service to order large amounts of data by mail on physical media?

I am often interested in some dataset for which I can afford the storage in the form of a hard drive but not in the form of a download through my home connection.

If a service existed that simply offered the following:

* customer provides URL (and optionally hash checksum)

* customer pays, and later receives hard disk drive / SSD drive with the download contained

* democratic pricing for the media, or alternatively send your own media (hence at twice media shipping cost...)

* possibly eventually local brick and mortar locations / affiliate locations to drop off and pick up media, in the larger cities

* preferably without account, although an account is not a large impediment

* definitely not coupled to a financial account in a credit fashion, i.e. no qualms with topping up the account, but the service should not be able to withdraw money from my financial account. i.e. debit only (like the typical european bank cards, yes I know credit cards are available in europe as well...)

This would seem like a profitable side business for many programmer types who have high bandwidth connections (or have access to them and are allowed to use the connection for this purpouse).

If someone builds the software stack for a main portal such that affiliates can advertise their physical location, and their pricing for media, for download and for copying to media, then customers could compare and choose on the basis of price.


I don't think it would be a cheap service if you set a modest goal of earning a cinema trip with popcorn and a drink for every trip to the post office.

1TB HDD: 59 AUD International Shipping (assuming we can keep it under 1kg with packaging): 38 AUD

That is 97 AUD before considering any profit you might want to make (35 AUD for the movie + small popcorn and drink), fixed setup costs like a NAS to cache data sets or the bandwidth costs.


If someone hails a cab on uber, one doesn't hail a cab from a different continent...

there seems to be plenty of opportunities here, like buy or rent an old bank building with the individually lockable drawers, put your drive in the locker which has a USB cable or ethernet cable, close the locker, use some app to set the URL / hash, pay, and you get a ETA, download complete notification, and a deadline to pick it up (or else incur a fee to unlock the drawer proportional to overtime).

I guess the idea could be pitched to those operating rentable local PO boxes, using similar lockers, but with internet connectivity.


While it doesn't fit all of your criteria AWS does offer something like that with AWS Snowball: https://aws.amazon.com/snowball/


The only thing worse than the data transfer costs is explaining them to your non-cloud-savvy stakeholders. Go ahead, try explain this sh*t to your board.


AWS Azure or Google cloud data transfer: 9 cents per GB

Digital Ocean: 1 cent per GB

The really weird thing is this should be the absolute lead on all Digital Ocean marketing but they don't even mention it. It's their single biggest selling point.


For a service I built on AWS lambda, the far majority of the cost was just data egress. Unfortunately I haven’t found a competitor that offers the same kind of features that I’m using lambda for (running golang with additional custom binaries)


Azure is part of Cloudflare's Bandwidth Alliance[1]. If you use Azure's serverless functions and put your service behind Cloudflare, you'll get free egress.

[1] https://www.cloudflare.com/bandwidth-alliance/


I was aware of the bandwidth alliance, though looking on their KB for now it's only discounted egress [1]. Same with Google, though not part of the bandwidth alliance, data delivered through their CDN Interconnect program starts at $0.04/gb (NA) [2]

[1] https://support.cloudflare.com/hc/en-us/articles/36001614391...

[2] https://cloud.google.com/interconnect/docs/how-to/cdn-interc...


so, bandwidth cartel


If you're at under 400GB/month, Netlify may be an option, their Functions are AWS Lambdas and the paid plan ($25/month) comes with that traffic included. It's $20/100GB for extra, though, so it stops making sense for greater scale.


Both Azure and GCP let you run arbitrary containers in a serverless fashion. GCP Cloud Run is per-request too.


Yes but both still charge crazy money for the bandwidth.

(Although Azure is apart of the bandwidth alliance if you use cloudflare/b2)


It's nickel and diming all the way. I always thought the S3 pricing was sort of ridiculous. Pay for the amount you store (fair). Pay for the egress data (less fair but ok...). Pay for the number of API calls! (lol!)


It wasn't true at first but people abused the hell out of it, for example, by storing data only in the filename. The API cost prevents very inefficient use of S3.


Data transfer fee is the biggest reason stopped me from using AWS/GCP/Azure. Anyone using Oracle Cloud? I found their bandwidth is really cheap.


The thing about that is that it's Oracle. You might survive stuffing your hand in a hornet's nest, but it's unlikely.


Cloud providers can launch and advertise and advocate every little product they're able to refactor out of the prevailing development and systems engineering practices, but transit remains expensive, and basically consistently so. Not only that, but AWS transit is like 20x the price of boring old hosting providers, providers who don't also charge for egress.


Even the huge-looking bill for Apple is only 6.5% of their total bill. Moving data into AWS is free, so it's not totally surprising that they pay for that by making other stuff more expensive.

I found it more interesting just to see a list of their top ten customers. In particular I didn't realize that Capital One had so much infrastructure.


> Even the huge-looking bill for Apple is only 6.5% of their total bill. Moving data into AWS is free, so it's not totally surprising that they pay for that by making other stuff more expensive

Moving data into any network that is outbound heavy is free because both paid peering and transit is settled based on a peak percentile traffic (unless it is flat rate).

That's why the "gansta" position is to have a balanced in/out for any network as in that case you get to effectively double charge for the same pipes -- your eyeball heavy customers pay for incoming and your web farm customers pay for outgoing.


This doesn't deserve to be downvoted, it makes a good point! I had not thought about the likely-to-be-outbound-heavy nature of cloud providers and how that affects things.


My GP's practice went to a SaaS provider for electronic health records. Never mind transmission costs -- there is no standard way to export their patient records in order to move to a different provider. It requires a custom consulting services contract with their existing provider to retrieve their patient records, at an exorbitant price tag.


This is quite typical. Even with MIPS PI measures and the push for interoperability. Getting patient data in bulk out of a hosted EHR can be a frustrating exercise. What they are doing is holding the providers' data hostage, in the name of HIPAA and security.


Cross AZ traffic:

https://www.lastweekinaws.com/blog/aws-cross-az-data-transfe...

No connection to the blog.


It sounds very similar to physical documents/records storage service “roach motel”-like fee structures.

Make it nearly free to move documents in.

Charge monthly rent per box of documents.

But charge like a wounded bull if customers try to permanently remove documents.


We mostly deal with this by keeping the data inside an AWS/Azure region as much as possible. You pay to get it in there, and you pay for storage, but you can access it for free within the same region.


Anyone (no longer under NDA) know if Apple intends to migrate user iCloud data from AWS and GCP object stores to their own data centers at some point?


> no longer under NDA

Do NDAs expire? AFAIK you sign an NDA and you’re bound to respect that forever. Unless the knowledge becomes publicly available.


IANAL

I have spoken to a lot of lawyers about contract terms and what I've been told for both NDA and Non-compete style agreements is that they have to have very clearly defined scope to be enforceable.

For a non-compete for example, it requires distance and specific field with a clearly defined time period (that's reasonable).

For an NDA you have to ensure that the covered information is explicitly labeled (which is why people have confidentiality labels in email footers) and that there is a defined time to expiration. An NDA without those two criteria place an undue burden on a person who is not being compensated for their compliance.

Reasonable time period is generally 2 years.


This is typically true of all contract law. "In perpetuity" is not a thing and if it is, it usually nullifies the contract.


I mean, it depends on the language of the NDA.

As a quick example, this random free template I found online[0] uses "5 years".

[0] https://www.docracy.com/1/generic-nda


Question for any lawyers here, do NDA's have any form of "natural expiration" if no explicit expiration date is set?

What happens to them if the company goes out of business?


The $0.01/GB inter-availability-zone data transfer cost can be a killer for misconfigured workloads. It's a shame AWS doesn't make this more clear on their pricing page. I've seen people run Spark clusters across multiple AZs, incurring a huge costs whenever a shuffle happens.


What's up with Apple's spend? Did they move most of their data out of AWS?


My take is also that they were moving out. The article does acknowledge that they spent a fraction of that on the next year.


Or they optimized via direct connect.


I've worked at at least one company where we found using Direct Connect to a data center somewhere and egressing via that data center instead of amazon saved us significant bandwidth costs.


amazon still charges for traffic through direct connect(50%), dont they?


2cents per GB vs 9cents per GB.


Where is the article getting these numbers from? Did they say and I missed it?


where's Corey Quinn


Cheap shovels and expensive ladders.


Amazon has gone one step past that and made the shovels expensive too. The fear of buying a computer has made them billions.


It's not just the fear of buying a computer, it's also the fear of hiring staff...


I don't think Amazon really provides any products that are going to prevent you from hiring staff. That's the domain of companies like Zendesk or Salesforce.

All they provide you is the bare metal in easily-purchasable quantities. Whatever time you saved not having to lug a server up to your datacenter and plug it in will be spent debugging CloudFormation stacks or figuring out how to auction off your reserved instances that you no longer want.


Obviously you'll still need some staff. But suddenly you don't need staff that knows how to handle physical server rooms and data centers, you limit it to AWS which covers a whole range of needs which would need to be addressed by different staff with different skillsets.


>The fear of buying a computer has made them billions.

Are you saying that AWS service aren't providing value to their customers but it's fear-based marketing?


I think you read too much into that comment. AWS didn't have to market the fear into the customers. People are naturally afraid of things. AWS does provide value, but it comes at a steep cost.


Yes.


Fear of buying a computer? Or awesomeness of instantly having a computer appear from the ether when you need it?


[flagged]


I didn't see this earlier, but need to let you know that comments like this will get you banned here. We want thoughtful conversation on HN. Please read https://news.ycombinator.com/newsguidelines.html and don't post like this on HN.


The cloud is like the Hotel California. Can checkout anytime but never leave.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: