I recently started running more Google Cloud VMs in the UK instead of the US. It was costing me £2 a day to run the VMs, but somehow I also paid £12 a day for "GCP Storage egress between NA and EU”.
Turns out that by default Google's Docker container registry only stores the Docker images in the US. So each time I launched a VM the Docker image was downloaded from the US. I wrote more about it here: https://www.mattzeunert.com/2019/10/13/reducing-docker-image...
The billing interface didn't show that the Cloud Storage cost was related to the Docker images. I was investigating my normal Cloud Storage use, but it didn't explain why I was being charged so much. Only after a few days did I get the idea that it might be the Docker images that were causing it.
It's not really clear to me what that would mean. Auto-replicating customer data like docker images across national borders is a non-starter for legal compliance reasons. That's got to be an opt-in, not an opt-out. So they can't just point outbound GCR requests at the EU registry and assume the customer's images are there.
It's not a mirror, it's just another place to put your images. If you're eu based then it's definitely better to use eu.gcr.io for everything from the start.
Yep, I started copying my images to the EU and Asia regions and now it's fine cost-wise. They don't have an Australia region for the container registry though, so launching VMs there is still pricy for me.
Is there a plausible explanation why egress fees from cloud providers costs around $0.1/GB? "Traditional" server providers such as Hetzner are able to offer bandwidth at orders of magnitude lower price (eg. $1.1/TB). I understand that cloud providers may have better interconnects or better uptimes, but that doesn't justify the magnitudes higher pricing.
This is, oddly enough, similar to a debate people have about consumers TV or Internet: should pricing be "unlimited" or "a la carte"?
AWS is combining all your networking charges into one lump "outgoing data transfer" fee. So it's heavily marked up in comparison to what they're paying for the outgoing data transfer, and you're not sure how much is profit vs. whether it's going to cover all their other costs.
So it might be fairer if AWS broke out separate line items for internal, incoming and outgoing data transfer, plus all the additional systems a customer uses.
I think AWS's billing is probably already on the falling side of diminishing marginal returns. That is, it's complex enough that more information would tend to hinder customers from getting the best price. Right now, if I plan to reduce my data charges, I have one variable to tinker with. If we expand this, it would mean I'm having to balance incoming / internal and outgoing charges. That sounds simple, but in terms of engineering it can be very complex.
The next claim is that this biases customers not to move. Of course, Azure and GCP have the same arrangement, so while you pay to move out of AWS, you don't pay to move in to Azure or GCP. So all the vendors are attempting to lock you in to their product, and at the same time trying to extricate you from their competitors, overall it's a wash.
So, yes, part of the motivation for egress charges is that ingress is a loss leader. But it's also true that egress is a metric that does, for the vast majority of their customers, directly translate into customer value. If there's a compelling case for doing it differently, someone should do it and see if it works.
> If there's a compelling case for doing it differently, someone should do it and see if it works.
Cloudflare doesn't charge for bandwidth. I always throw cloudflare on top of anything I do, not because I really need a CDN or anything, but because the bandwidth cost would bankrupt me otherwise. The ceo of cloudflare gave the rationale on why they don't charge:
> There’s a fixed cost of setting up those peering arrangements, but, once in place, there’s no incremental cost. That’s why we have similar agreements to Backblaze in place with Google, Microsoft, IBM, Digital Ocean, etc. It’s pretty shameful, actually, that AWS has so far refused. When using Cloudflare, they don’t pay for the bandwidth, and we don’t pay for the Bandwidth, so why are customers paying for the bandwidth. Amazon pretends to be customer-focused. This is a clear example where they’re not.
According to Cloudflare, they do not have any bandwidth pricing arrangement with Microsoft for Azure users.
They also do charge for Enterprise plans, but instead of transparent pricing I got high-pressure sales techniques and black box pricing offers - which then anchored our rate so that as we grow past our current contract, we're forced to upgrade at any point with pricing based solely on our original negotiation.
Frankly, while I save money using Cloudflare over Azure's CDN right now, it's left a very sour taste in my mouth and I'll be jumping their ship as soon as I have time to find a suitable alternative.
Cloudflare most certainly disables zones on the free plan that use excess bandwidth. Enterprise contracts are also negotiated based on transit and those prices mirror comparable CDN services.
All that tells us is that cloudflare has a different revenue stream. Amazon is a business and they are in the business of making money. If they weren't charging for egress bandwidth they'd just charge for something else.
"I think AWS's billing is probably already on the falling side of diminishing marginal returns. That is, it's complex enough that more information would tend to hinder customers from getting the best price. Right now, if I plan to reduce my data charges, I have one variable to tinker with."
No, it's two variables - the egress charges you refer to and the actual cost to store the data.
We[1] have found that it is, as you might expect, quite a bit simpler to charge for just the storage and forget about metering the usage/bandwidth/transfer.
So we have typically had our price point higher than the B2s or Wasabis of the world, but there's just one simple number to think about - and no potential for surprises in the billing.
I will admit to having a bit of concern over adding 'rclone'[2] to our platform and the potential for users to just burn bandwidth using an rsync.net account as a "transfer host" but that is why we peer with he.net and their cheap an plentiful 10gb pipes.
And how many PoPs regional interconnects, highly availabile, high throughout connections, cross continent highly available connections do you have? Do you detect failure across these connections? Do you detect grey failures? Do you have a team of infrastructure engineers to look after this network?
> So it might be fairer if AWS broke out separate line items for internal, incoming and outgoing data transfer, plus all the additional systems a customer uses.
The example given above for comparison, Hetzner, also doesn't charge for inbound and internal transfer AFAIK. Nor "the additional systems a customer uses". You pay a charge for the server, you get some amount of traffic included, and if you go over, the additional traffic costs something like $1.1/TB. That's all you pay.
> AWS is combining all your networking charges into one lump "outgoing data transfer" fee
> So it might be fairer if AWS broke out separate line items for internal, incoming and outgoing data transfer
This explanation doesn't cut it for me - most (all?) "traditional" VPS providers don't charge for ingress traffic, and I doubt anyone, ever, has charged for internal traffic.
So what exactly is 'all the networking charges' comprised of, other than egress data?
AWS is vast. I have no idea what their overall accounting for networking looks like. Even for the tiny service I worked on, it would be tough to guess at what are overall costs were. We actually had an internal bill each month for all the regular AWS services we used, but then there were a host of internal services we depended on.
That companies don't charge for specific things doesn't mean those things don't cost them anything. It just means they're trying to work out a pricing scheme that scales with customer usage and is broadly understandable. So "data egress" is really just a proxy for "how much stuff you're doing with the networking subsystems of AWS."
Same thing with EC2, there are a whole pile of costs that are summed up with "time you rented an instance."
See a lower comment I made here; what I really want is a little transparency about pricing.
Of course there are is an internal cost of doing business, and peripheral infrastructure cost - but if I pay $100 for service "A" I reasonably expect that fee pays for service "A". Instead, egress bandwidth costs seem to be used to trick customers into thinking services are cheaper than they really are.
I guess there aren't that many global network providers - I'm not even sure how much fiber Amazon owns in Japan, Australia or Northern Europe for that matter.
And while they have a call-us price list (if you have to ask...) - they at least state:
"Public and Private high-capacity networking options up to 10Gbps. Note: there is no charge for internal data center traffic.
Cost on a per-GB-out model"
I have no idea what they charge pr gb for this cloud product however.
I get your point, but then why does AWS charge for inter-az traffic? That seems like an "egress but not really" kinda thing. If AWS/GCP stopped charging for this, customers would be incentivized to build HA systems and distribute their workloads across AZ's, which are a win for both customers and you (since capacity is now spread instead of stuck in a zone).
> Of course, to make them fully independent, you have to replicate everything, so you wind up buying several redundant copies of your system...
Yeah, and keeping around warm systems ready to failover in case of a zonal outage seems like a preposterous waste of resources.
The alternative... to keep around multiple replicas of your system in different zones, all ready to accept traffic and which do serve traffic, seems more practical and less wasteful.
Sure, from the perspective of availability, it makes sense to keep this around. The US military has redundancies in place to handle many kinds of adverse scenarios, which comes at a price, but is justifiable. The point I'm trying to make is that if availability is the _only_ value, it becomes hard to justify that if you're a scrappy for-profit corp looking at your bottom line.
If instead of availability being the only value, there would be a more value provided from actually using such resources, more folks would adopt cross-AZ architectures which would be a win-win for both the customer (get HA for lower or no cost and go down less often and succeed in the market) and thus the cloud provider (keep raking in the steady cloud revenue as the customer grows).
> so you wind up buying several redundant copies of your system
This is one of those gotcha's that company's hit. They see the public pricing page and think "wow that is much cheaper than one my internal IT department charges for X", and then when they go to actually implement they find that "best practice" says they basically have to more than double or even triple the cost to get a reliable system (more because not only do you have to duplicate all the infrastructure into a second AZ, you are getting charged for the replication traffic between them).
But in all fairness, if you actually implement that ”best practice” HA infrastructure, you will also be miles ahead of almost all internal IT departments.
Because AWS is designed to be like a pitcher plant, shaped like a funnel with inward-facing hairs to make it easy to get in, difficult to get out.
That's pretty much it - in my experience most of AWS' awesome features are designed to lock customers into an environment where Amazon increasingly provides all of the components and services you need to do business.
I would assume the endgame is to create an ecosystem where the vast majority of customers allow their IT function (infrastructure, developing software, engaging third-party SaaS vendors, etc) to atrophy entirely, after which they'll have no choice but to buy what Amazon is selling, at any price, in perpetuity.
That depends on how much saturation you can get. Most end users/companies can’t saturate this reliably, or have much spoiler traffic. Only the biggest players can minimise their seasonality enough to get close to this, and doing that is a service that it’s probably worth paying for.
People forget just how affordable it can be to maintain your own infrastructure. You can have the hardware and network capable of supporting 10X your average traffic loads and still have it operate far more cost effectively than the equivalent traffic on AWS.
At my work, we slashed our overall hosting costs by moving a data warehouse off of AWS and on to our own self-maintained infrastructure.
But with a bloated inefficient IT department or non-savvy negotiations with hardware vendors or transit providers, it can also be more expensive than AWS.
I have had this debate with so many people, AWS is not cheaper than your own hosting. Period.
What you get from AWS is the logistics pipeline is already built as is the infrastructure should you suddenly require to serve factors of traffic more.
The ROI of AWS comes from the backend and capex vs opex debates.
Example : The CFO can go to the board and explain we are getting ready to reduce opex by laying off 10 developers at 150k yr during any meeting. The capex cost is usually fixed and hard to explain away.
I think the best cases for AWS is counter to marketing, very small companies. Their native offerings do make things great for a while when you're starting out.
The other is stupendous burst activity, like you just need a thousand(s) cores for a couple hours. Of course this doesen't mean the baseload has to be in AWS, just easy for small teams.
I hear more and more cases of businesses moving parts off of the cloud, back on-premises. I think, in a wider perspective, it's going to play back and forth. The 2010's were 'forth' towards the cloud, the 2020's might see a peak back on-prems, etc.
It's generally a function of network cost versus infra cost, and so the 'equation' solves differently based on the longest tech cycles, from inception to maturity to skill pool to diminishing returns and then back again on some other mode — this really is a decade+ thing.
Some "always true" inherent advantages of one approach (e.g. availability for cloud, or resources for on-prems) would remain across cycles as permanent gains; disruption then occurs when some new approach (e.g. containerization) fundamentally upsets the order of costs.
> It's generally a function of network cost versus infra cost, and so the 'equation' solves differently based on the longest tech cycles,
It used to be OpEx was easy, CapEx was hard to get approved. As people took advantage and OpEx went through the roof people are getting alot more pushback on reducing OpEx.
Pushing back on OpEx in favor of CapEx might be seen as a more long-term strategy too. Basically rent vs purchase, even for deprecating assets like infosys, as I think the current DevOps trend (scaling in pure software, virtualization, etc) makes it easier than ever to squeeze every last FLOP on-premises.
They want you to keep all of your data in their cloud, do all your processing there using their services (because doing it elsewhere incurs expensive egress fees), and get paid handsomely for your need to actually serve the data to your customers.
This also makes a migration additionally expensive, because you would need to egress all your data (old logs, some fresh backups, etc) in a short while.
I think it's more than that because a lot of folks aren't aware of the costs until they need to or want to move and then they get hit with a massive bill.
So maybe they chalk it up to experience and pay the bill because they have no other choice or calculated it's still better in the long term.
To me that's a much difference scenario than walking into a store and happily buying a dozen eggs for anywhere between $1 and $1.50. In this case you know what you're getting into before you make the purchase and everyone around you (other customers and businesses) decided that's what eggs will sell for in the open market. With outgoing data fees, it's more like a "take it or leave it" price dictated by the provider while they already have your data and there's no price competition since they are the sole business with your data.
Whether or not it's the customer's fault for not doing enough research is debatable, but it certainly doesn't help that most providers make it pretty difficult to calculate costs.
While true, this dilutes the value of the statement. At that rate every price point “is the market”, and nothing of further interest can ever be said about counter intuitive pricing.
Sure, it’s the market. Why? Is it a Giffen good? Is it a case of very poor visibility of services to consumers? ...? There is something interesting going on, let’s figure it out.
There are some bandaid-solutions people have used to migrate away from this without taking a major hit. One is utilizing the free outbound bandwidth from lightsail, and bigger transfers can use snowball which is 'only' $0,03/GB outbound + shipping
A major issue is that in most markets outside of Europe, peering is less common, and transit prices are significantly higher.
Europe is probably the most competitive and best market in this case, almost all other markets are dominated by monopolies. Yes, even significantly worse than the Telekom monopoly.
Look at transit in Singapore, Japan or Australia. Or even in Brasil. You’ll go bankrupt even from trying to deliver a single movie to customers.
Actually Singapore and Australia aren't that terrible anymore.
Hurricane Electric has POPs in both Singapore and Australia. Transport between Singapore and Australia is $1.50 or less. Peering ports are $0.20 per Mbps.
If all you want to do is push movies at customers, there are plenty of dedicated server providers who will sell you bandwidth on the cheap in both countries.
Obviously YMMV if you want better routes or direct interconnects with local monopolies.
Well that's exactly the issue, HE often doesn't have peering with local monopolies, or at best only a 10G link, because those monopolies don't even care.
Unless I'm making a mistake someplace, Digital Ocean includes 1TB egress to the internet from a $5-per-month host. On Google Cloud Platform that egress would cost $120/month by itself.
What you find is if you spin up 1000 of these $5 hosts (ie, $5000 per month which is pretty) and try to egress 1PB per month you can't. Either they disconnect you, the systems can't actually handle it or some other gimmick.
One thing folks like about AWS - you can actually know what you will be paying and there is no fake / hidden limits.
That's my somewhat outdated experience wasting a TON of time on this idea ages ago.
But going back to the original number of $0.01/GB, that's what they state upfront as a bandwidth overage cost. I would be very surprised if they disliked customers paying the overage fee. And that's still only $10k per petabyte, vastly cheaper than AWS.
Give them a warning if you're going to spike your bill that high, but there shouldn't be any fake/hidden limits.
I was just reading about advisor termination fee terms which were some substantial multiple of all fees incurred over some past number of years.
People under pressure make short term optimizations at the expense of long term strategy. There's nothing more to it. AWS reduced compute and IT expenses in the next few fiscal years, so people jumped all in on it. Solve the problem now. Someone will fix it in the future.
I doubt that you'll find you can consistently push or pull 1Gbps from a Hetzner server. I certainly can't. --That's not to say that they're a bad deal, but their total bandwidth is nowhere near what the major cloud providers can give you.
Why can't you? I'm doing this regularly (backups & restore). I get reliably 1gbit/s from other providers, also from US (with multiple connections of course). Haven't come across a provider where speed would be significantly below that.
I have few dedicated and few vps there, this year I had less than 5 minutes of downtime on 1 of the vps and 100 % uptime for all the rest, do you have more details about your experience?
I've actually been working on a library to help mitigate cloud storage lock-in. The idea is to treat cloud storage providers like disks are treated in RAID. For example, you have 3 separate cloud providers. Cloud providers 1 and 2 have every other byte of data striped across them. Cloud provider 3 has parity data. To pull a file you can only need 2 of the 3 cloud providers. If you don't like how a cloud storage provider is treating you or charging you just pull from the other 2 providers and use them as a backup in case one goes down. You can also just remove them entirely from the equation, but then you have no redundancy if one of the others goes down. It gives you a lot of negotiating power to lower egress costs because you can just pull them out of the equation at any time and reinstate them once you get better pricing.
I assume you didn't mean that literally because I can't see how that will ever work out in terms of cpu cost. I think breaking it up into blocks like what RAID4/5/6 would be better but will still impact the performance of reads.
The performance of writes is going to be worse. Not because of the parity calculation but because you will be taking the max latency over all the cloud providers.
I can't see people trading off that much performance for better fault tolerance (in a world where S3 guarantees 11 nines) or ease of switching.
Yeah, RAID 4/5/6 are planned for the future. The plan is to offer all of them and let developers choose what is the best practice for their application. RAID 0/2/3 are not CPU efficient, but are great for privacy and security. No cloud provider has the full picture and can't spy on your data and if they have a data leak it won't be anything useful. RAID 1 gives great fault tolerance with no extra latency (except on failures) and prevents vendor lock-in.
It's actually cheaper when you want global redundant storage. Cloud providers often charge twice as much for global redundant data. RAID 2,3,4 offer global redundancy, but only take up 1.5 times us much space. Instead of paying twice as much you only end up paying 1.5 times as much because you can get away with locally redundant pricing. If you're large enough you'll actually save money by having more negotiating power since you can walk away from a provider at any given time.
Are you doing this to replace a CDN? There are already 3rd party CDNs like CloudFlare.
If you are doing it as a replacement for traffic within an AWS region and availability zone, it seems like you will be both more expensive and have much higher latency.
It's something else entirely. It's a mixed cloud approach combining the storage offerings of Azure, Google Cloud, and S3 providers. The idea is not to trust one cloud provider to provide fair pricing and proper redundancy. Right now I'm mirroring RAID 0,1, and 3. Applying RAID 3 to the cloud is going to give you higher latency and more processor and memory usage because the file has to be reassembled on the client machine. However, if you apply RAID 1 to the cloud your latency is similar because each cloud provider has the full file. In the case of RAID 1 the library will upload a full copy to each cloud provider and will download files by trying providers until one succeeds. If you only use two providers your pricing is usually the same because geo-redundancy in storage providers is often twice the cost and you're getting geo-redundancy built in by having multiple providers in different regions. RAID 3 is actually cheaper because you have geo-redundancy, but you're only storing 1.5 times as much data.
Yeah it's pretty similar. However, I'm focusing entirely on the cloud, keeping the package lightweight, and giving the consuming application decisions on how to store the data based off their needs.
GCP is pretty good though. Cold-potato, very fat backbone, and very good presence at a ton of PoPs. When using GCP you basically have the same global, high-bandwidth direct connectivity presence that Google uses for its products, and that is very difficult to match by traditional T1 ISPs.
The origin of which, for those who aren't familiar, is a game called "hot potato" where you try to pass a ball around as quickly as possible as if it was a hot potato
It is the inverse of “hot potato” routing where the network tries to get rid of a packet as soon as possible (that is, drop a hot potato). Cold potato means the network keeps the packet on-network as long as it can.
Cold potato is not necessarily better, actually its often times worse than hot potato and usually used to lower costs so you don't have to pay other people for transit.
For example, a cold potato network may have a link from Dallas to Chicago to New York, while a hot potato network could have a direct link from Dallas to New York.
Cogent uses cold potato and is frequently worse than other transit providers.
So it's hard to value the premium for google cold potato specifically, if it outclasses everything else.
But their hot potato still costs $65+ per TB at medium volumes and $45+ per TB at high volumes. That is still extremely high compared to normal peering costs.
My company was getting massive bills on AWS S3 egress. It was one of the reasons we moved to Wasabi for bucket storage; we then had to deal with a huge one time hit for egress, but in the long run the short term cost was worth it.
I like the Cloudflare/Backblaze duo. Presumably data stays on Cloudflare's network for as long as possible and goes to Backblaze via direct links, so Backblaze can provide free egress to Cloudflare customers (and customers of other serivces like Packet, etc.), while charging others.
This seems more sustainable than Wasabi's model, but there's no way of knowing for sure.
Backblaze has good rates, but it's only really good for storage...which is their main use. I am also surprised GCS (GCP?) is so high. Last a checked AWS was the highest, maybe they cut their rates. I am a pretty happy Azure customer. My only complaint really are their service plans being subpar, but now I've switched over to their IaaS model and it's much better.
Someone should set up a big fat pipe right outside of Amazon data centers with free unlimited transfers, get data on behalf of customers copied to hard drives from AWS, and then attach those hard drives to the Uber pipe. I bet that service could work for a short & glorious moment in time.
Totally agree. AWS should learn something from DO, instead of racking up services. Inconsistent UI completely sucks. It makes very difficult to track resources and billing. Resources from different regions need to be managed separately. Billing is too much complicated. I always feel AWS is hype. Very few applications need scaling like Netflix. But as it is now industry standard, people are adopting it blindly without considering usecases.
Fun fact is, that Netflix only hosts their website on AWS, the video stream is delivered from their very own CDN which is completely self-engineered by them and multiple magnitudes cheaper than AWS: https://openconnect.netflix.com/
A flat fee for the act of a human getting the data onto the physical medium ($200) + the cost of shipping (<$100?) + $15 per day you keep the snowball device past the first + price per GB of data you're transferring ($0.03 per/GB).
So if your getting out 30 TB of data that's $200 + ~$100 + ($0.03 * 30000) = ~$1200
At some point everyone realized they could charge a massive premium per bit and as long as everyone did it, customers would have to pay. So here we are.
Customers are also to blame, when comparing the costs of two services they tend to look at the cost of an instance hour, or lambda execution and often don't look at transfer costs.
Even if a cloud provider had competitive transfer costs they likely wouldn't attract any new customers and would have less margin left over to subsidize the main cost customers look at, $ per instance hour.
The less attention is paid to transfer costs the better for AWS/GCP/Azure. Why hasn't a spot-market for transfer been introduced? Same reason why I can't sell my unused home internet bandwidth to my neighbors, the money is in controlling the means of transportation/communication and the providers want to keep as tight a control on that as possible.
This seems a little deliberately obtuse -- for example, showing two arrows from an EC2 instance to an EC2 instance that exits the VPC. But I generally don't find this too hard to follow? Traffic within an AZ is generally free, but there are some cases where it's not and they generally make sense to me (leaving the VPC, pushing data from your CDN back upstream, etc.)
Then again, I worked for AWS for years, so maybe I'm just used to thinking this way so I'm not really surprised by it.
There was a time when you paid for available bandwidth. Then network operators realized they could oversell their capacity and not spend the money to upgrade their network.
You still see paying for bandwidth with residential connections, though some operators (like Comcast) are trying to do away with it.
Disagree. It's much more wasteful to have an outage. Roll back asap, fix the issue, roll forward, do post mortem, grow as an organization. Never repeat the same mistake.
What I'm saying is that for a hosting architecture to make it difficult to predict the cost of any code change is a downside compared to an architecture that makes such predictions easy and intuitive.
Of course you will try to mitigate any downsides and learn what you can from any mistakes. But unpredictability makes learning far more difficult than it should, which inevitably means a waste of development resources.
I don't think this is true. At $JOB the extent of our cloud cost management is me reading a breakdown by SKU and looking for obvious inefficiencies, and we are very aware of transit fees. I would imagine that anyone in the 5MM+ range has actual models that account for this stuff.
I think this has more to do with collusion than consumer behavior. On average consumers are very rational, even if their rationality is hard to explain.
The issue with per-bit pricing is that a fair agreement for network use would probably look like paying a fee that makes up for the amortization of the network equipment. Anything else is an artificially restricted market created in an attempt to extract more value out of consumers by having them bid against each other.
At some point, yes, we will run out of places to put the switches and routers and then the cost of connectivity will be closer to the cost of land use and will mimic rent, but we are a ways away from that.
Why do you think that bandwidth costs should only cover the hardware? What about the electricity, rent, payroll, sales, marketing, administrative staff, insurance, accountants, lawyers, etc.
I recently saw a talk by a couple of former Google employees who have a business helping cloud customers save money, and they were saying that they see a lot of money being wasted by companies in overprovisioning and neglecting to shut down or delete unused resources like VMs or virtual disks.
Some of their advice for saving money was to keep track of who created each resource and why, so there's less reason to doubt whether an apparently unused resource can be deleted, and to make some limits regarding how many resources can be automatically created (especially in dev environments). Some other ideas were to look for signs of inactivity like low CPU or bandwidth use, and consider deleting such little used resources.
There was much more to the talk, but those were some of the highlights that I can remember without digging out my notes. It was a good talk.
Makes sense. If you're not properly tagging your resources then it can be very hard to track down if/where it's been used, how often, or when it was used last. You can automate/template as much orchestration as you want with stuff like Terraform to bring up well-defined resources, but there will always be outliers without tagging.
> A person close to AWS said its data transfer charges reflect a range of technology costs customers would normally pay if they weren’t using cloud services, including fiber optic connections, networking hardware devices and software, cybersecurity services and network monitoring software.
I'd believe this more if the pricing of bandwidth on AWS hadn't stayed pretty flat since its launch.
Plus, it's frustrating that AWS Lightsail (https://aws.amazon.com/lightsail/pricing/) offers a $3.50/month plan with 1 TB of transfer. That terabyte alone will cost you $92.07 on a normal instance, and the $3.50 includes storage and an instance!
I assure you there is nothing sinister about the price asymmetry. Fiber and routers have as much bandwidth coming as going. The inbound traffic is mostly composed of little http(s) requests. The outbound is full of images and mountains of JavaScript. Cloud providers don't charge for ingress because they got it for free when they grew their egress capability to meet demand.
They don't own many of their data centers. Locations were leaked a while ago, most are colocated in major data centers. They won't dig new trenches and lay cables between those but rather negotiate rates for existing fiber.
Fairly large optimization if you're smaller and a large amount of your data out is cachable is to run a varnish cache on some of the clouds that give you "free" bandwidth.
ie a $20/mo instance on linode gets you 4TB of transfer -- $0.005 per gig. Scale enough of these in various DCs around the world and you have a pretty cheap self-hosted CDN.
$20/mo for 100TB metered ($0.0002/gigabyte). $120/mo for 200TB metered ($0.0006/gigabyte). $170/mo for 1gbps unmetered ($0.00085/gigabyte at 200TB, $0.00052/gigabyte at full saturation one way).
Maybe I missed it but where did they get this info? Was it stolen from AWS?
As far as I know, Apple would never share cost data like this, nor would a lot of others on this list. Apple doesn’t even publicly acknowledge that they are customers.
>"The chart, which is based on internal AWS sales figures obtained by The Information,"
I'm not sure that "stolen" is the right word but, yes, it certainly appears based on what The Information wrote that either someone at AWS or a third-party with access to the info (probably not too likely) leaked the confidential numbers. Disgruntled former employee or... Who knows.
I have recently talked with a friend about a very cheap way to egress from AWS, at $2.5 per TB with the AWS lightsail $5 instance which include 2TB of egress per instance. This friend have extracted 200TB of backups for $500 instead of $17,000!
> 66.3. You may not use Amazon Lightsail in a manner intended to avoid incurring data fees from other Services (e.g., proxying network traffic from Services to the public Internet or other destinations or excessive data processing through load balancing Services as described in the Documentation), and if you do, we may throttle or suspend your data services or suspend your account.
I tried but seems aws has closed the door. To connect ec2 to Lightsail via private IP I have to create a "ec2-classic VPC" in ec2. But aws has disabled "ec2-classic VPC" in all regions
From what I know is that you can peer the Lightsail (AWS-managed) VPC with your normal, default EC2 VPC via the Lightsail console. Then you launch EC2 instances in the default VPC for the transfer
Just curious, how on earth did Kevin and Amir get such incredibly detailed of AWS spending of these accounts.
Either they put it together from public resources, or (more worrysome) someone at AWS leaked them line-item based expenses of their top customers.
This has got to be incredibly sensitive data for AWS, not something they'd want leaked. If I'm (say) AirBnB or Snap, I'd worry that this data leaks information that would be detrimental to their ability to negotiate for cloud computing with Google, Microsoft etc.
does anyone know of a cheap service to order large amounts of data by mail on physical media?
I am often interested in some dataset for which I can afford the storage in the form of a hard drive but not in the form of a download through my home connection.
If a service existed that simply offered the following:
* customer pays, and later receives hard disk drive / SSD drive with the download contained
* democratic pricing for the media, or alternatively send your own media (hence at twice media shipping cost...)
* possibly eventually local brick and mortar locations / affiliate locations to drop off and pick up media, in the larger cities
* preferably without account, although an account is not a large impediment
* definitely not coupled to a financial account in a credit fashion, i.e. no qualms with topping up the account, but the service should not be able to withdraw money from my financial account. i.e. debit only (like the typical european bank cards, yes I know credit cards are available in europe as well...)
This would seem like a profitable side business for many programmer types who have high bandwidth connections (or have access to them and are allowed to use the connection for this purpouse).
If someone builds the software stack for a main portal such that affiliates can advertise their physical location, and their pricing for media, for download and for copying to media, then customers could compare and choose on the basis of price.
I don't think it would be a cheap service if you set a modest goal of earning a cinema trip with popcorn and a drink for every trip to the post office.
1TB HDD: 59 AUD
International Shipping (assuming we can keep it under 1kg with packaging): 38 AUD
That is 97 AUD before considering any profit you might want to make (35 AUD for the movie + small popcorn and drink), fixed setup costs like a NAS to cache data sets or the bandwidth costs.
If someone hails a cab on uber, one doesn't hail a cab from a different continent...
there seems to be plenty of opportunities here, like buy or rent an old bank building with the individually lockable drawers, put your drive in the locker which has a USB cable or ethernet cable, close the locker, use some app to set the URL / hash, pay, and you get a ETA, download complete notification, and a deadline to pick it up (or else incur a fee to unlock the drawer proportional to overtime).
I guess the idea could be pitched to those operating rentable local PO boxes, using similar lockers, but with internet connectivity.
The only thing worse than the data transfer costs is explaining them to your non-cloud-savvy stakeholders. Go ahead, try explain this sh*t to your board.
AWS Azure or Google cloud data transfer: 9 cents per GB
Digital Ocean: 1 cent per GB
The really weird thing is this should be the absolute lead on all Digital Ocean marketing but they don't even mention it. It's their single biggest selling point.
For a service I built on AWS lambda, the far majority of the cost was just data egress. Unfortunately I haven’t found a competitor that offers the same kind of features that I’m using lambda for (running golang with additional custom binaries)
Azure is part of Cloudflare's Bandwidth Alliance[1]. If you use Azure's serverless functions and put your service behind Cloudflare, you'll get free egress.
I was aware of the bandwidth alliance, though looking on their KB for now it's only discounted egress [1]. Same with Google, though not part of the bandwidth alliance, data delivered through their CDN Interconnect program starts at $0.04/gb (NA) [2]
If you're at under 400GB/month, Netlify may be an option, their Functions are AWS Lambdas and the paid plan ($25/month) comes with that traffic included. It's $20/100GB for extra, though, so it stops making sense for greater scale.
It's nickel and diming all the way. I always thought the S3 pricing was sort of ridiculous. Pay for the amount you store (fair). Pay for the egress data (less fair but ok...). Pay for the number of API calls! (lol!)
It wasn't true at first but people abused the hell out of it, for example, by storing data only in the filename. The API cost prevents very inefficient use of S3.
Cloud providers can launch and advertise and advocate every little product they're able to refactor out of the prevailing development and systems engineering practices, but transit remains expensive, and basically consistently so. Not only that, but AWS transit is like 20x the price of boring old hosting providers, providers who don't also charge for egress.
Even the huge-looking bill for Apple is only 6.5% of their total bill. Moving data into AWS is free, so it's not totally surprising that they pay for that by making other stuff more expensive.
I found it more interesting just to see a list of their top ten customers. In particular I didn't realize that Capital One had so much infrastructure.
> Even the huge-looking bill for Apple is only 6.5% of their total bill. Moving data into AWS is free, so it's not totally surprising that they pay for that by making other stuff more expensive
Moving data into any network that is outbound heavy is free because both paid peering and transit is settled based on a peak percentile traffic (unless it is flat rate).
That's why the "gansta" position is to have a balanced in/out for any network as in that case you get to effectively double charge for the same pipes -- your eyeball heavy customers pay for incoming and your web farm customers pay for outgoing.
This doesn't deserve to be downvoted, it makes a good point! I had not thought about the likely-to-be-outbound-heavy nature of cloud providers and how that affects things.
My GP's practice went to a SaaS provider for electronic health records. Never mind transmission costs -- there is no standard way to export their patient records in order to move to a different provider. It requires a custom consulting services contract with their existing provider to retrieve their patient records, at an exorbitant price tag.
This is quite typical. Even with MIPS PI measures and the push for interoperability. Getting patient data in bulk out of a hosted EHR can be a frustrating exercise. What they are doing is holding the providers' data hostage, in the name of HIPAA and security.
We mostly deal with this by keeping the data inside an AWS/Azure region as much as possible. You pay to get it in there, and you pay for storage, but you can access it for free within the same region.
I have spoken to a lot of lawyers about contract terms and what I've been told for both NDA and Non-compete style agreements is that they have to have very clearly defined scope to be enforceable.
For a non-compete for example, it requires distance and specific field with a clearly defined time period (that's reasonable).
For an NDA you have to ensure that the covered information is explicitly labeled (which is why people have confidentiality labels in email footers) and that there is a defined time to expiration. An NDA without those two criteria place an undue burden on a person who is not being compensated for their compliance.
The $0.01/GB inter-availability-zone data transfer cost can be a killer for misconfigured workloads. It's a shame AWS doesn't make this more clear on their pricing page. I've seen people run Spark clusters across multiple AZs, incurring a huge costs whenever a shuffle happens.
I've worked at at least one company where we found using Direct Connect to a data center somewhere and egressing via that data center instead of amazon saved us significant bandwidth costs.
I don't think Amazon really provides any products that are going to prevent you from hiring staff. That's the domain of companies like Zendesk or Salesforce.
All they provide you is the bare metal in easily-purchasable quantities. Whatever time you saved not having to lug a server up to your datacenter and plug it in will be spent debugging CloudFormation stacks or figuring out how to auction off your reserved instances that you no longer want.
Obviously you'll still need some staff. But suddenly you don't need staff that knows how to handle physical server rooms and data centers, you limit it to AWS which covers a whole range of needs which would need to be addressed by different staff with different skillsets.
I think you read too much into that comment. AWS didn't have to market the fear into the customers. People are naturally afraid of things. AWS does provide value, but it comes at a steep cost.
I didn't see this earlier, but need to let you know that comments like this will get you banned here. We want thoughtful conversation on HN. Please read https://news.ycombinator.com/newsguidelines.html and don't post like this on HN.
Turns out that by default Google's Docker container registry only stores the Docker images in the US. So each time I launched a VM the Docker image was downloaded from the US. I wrote more about it here: https://www.mattzeunert.com/2019/10/13/reducing-docker-image...
The billing interface didn't show that the Cloud Storage cost was related to the Docker images. I was investigating my normal Cloud Storage use, but it didn't explain why I was being charged so much. Only after a few days did I get the idea that it might be the Docker images that were causing it.