Hacker News new | past | comments | ask | show | jobs | submit login
Amazon S3 will no longer charge for several HTTP error codes (amazon.com)
289 points by axyjo 14 days ago | hide | past | favorite | 76 comments



AWS is full of dark patterns. You can sign up for the so-called "free" tier and then too easily, unwittingly enable something that suddenly charges you hundreds of dollars before you know it (by getting a bill at the end of the month), even if you're not doing anything with the service except looking around. AWS doesn't give any warning to free tier members that a configuration change is going to cost you, and their terms are also very confusing. For example, PostgreSQL is advertised as free, but "Aurora PostgreSQL" is quite costly.


> unwittingly enable something that suddenly charges you hundreds of dollars before you know it

The default is to have current and estimated monthly cost displayed on your root console as soon as you login. You will also get email alerts when you hit 50% and then 80% of your "free tier quota" in a month.

> even if you're not doing anything with the service except looking around.

I'm not aware of any services which will cost you money unless you actively enable them and create an object within it's class. Many services, such as S3, will attempt to force you into a "secured" configuration that avoids common traps by default.

> For example, PostgreSQL is advertised as free, but "Aurora PostgreSQL" is quite costly.

There's a rubric to the way AWS talks about it's internal services that is somewhat impenetrable at first. It's not too hard to figure out, though, if you take the time to read through their rather large set of documentation. That's the real price you must pay to successfully use the "free tier."

Anyways.. PostgreSQL is an open source project. Amazon RDS is a managed service that can run instances of it. Amazon Aurora is a different service that provides it's own engine that is _compatible_ with MySQL and PostgreSQL.

To know why you'd use one or the other, the shibboleth is "FAQ," so search for "AWS Aurora FAQ" and carefully read the whole page before you enable the service.


"It's not too hard to figure out, though, if you take the time to read through their rather large set of documentation"

I don't even know where I would start with that. https://docs.aws.amazon.com/ lists documentation for 305 different products!


Well, presumably, from the service you decided to use first?


We have a slack channel, #aws-budget-alerts, where AWS sends a notification any time our forecasted spend reaches certain milestones or the actual spend reaches certain milestones.

It's a really easy to set up app!


Ooh, that sounds nice. Beats my current slew of emails around week 3 of the month. Got a link to any docs to get set up quickly?


> AWS doesn't give any warning

It does if you ask it to. You can get billing alerts if current costs are projected to go over a threshold.


You have to manually set this up though.

There's so much UX for a prospective new AWS dev that could be improved. Say a 1 click "do you want billing alerts with that?" Template, or a "do you want to lock down expensive nonfree stuff?" option to set some soft limits to zero out of the gate (nescessitating a self serve support case to unblock).

It's frustrating. It's been this way for over a decade yet you'll still see new customers cutting themselves on the nuances of free tier. I get that AWS is 'enterprise real deal do what I say', but I don't think that means you should completely exclude the customer story of any new developers just getting their feet wet. It's an area of customer obsession the business has regrettably lacked, and if you go by the continuous stories of people messing it up on HN/twitter/reddit/etc, it only becomes more glaring how little the new guys are being taken care of.


But it's kinda the same as a trial where you have to put in a credit card number.

If they auto-charge once the trial is over, I don't like them. That is a dark pattern.

Equally, with this, AWS could very well ask you, the user, what you'd like to do if you surpass the free tier. Charge? Or turn it all off.

On top of that they could instate default thresholds so that you, the person who just started their free trial, does not get bill shock when you forget to turn of that $200/h machine.


Almost all trials tell you how much they'll charge you.


My threshold is $0. I was never expecting to get billed on the free tier.


Most cloud providers work this way somehow. Flexible, pay as you go infra doesn’t cope well with fixed pricing.

Fixed price cloud offerings exist for some services, but can end up with an apparently larger sticker price.


Related:

Jeff Barr acknowledges S3 unauthorized request billing issue - https://news.ycombinator.com/item?id=40221108 - May 2024 (18 comments)

How an empty S3 bucket can make your AWS bill explode - https://news.ycombinator.com/item?id=40203126 - April 2024 (111 comments)


The system works! Just raise your concerns and they'll get around to it in [checks notes] 18 years

https://twitter.com/cperciva/status/1785402732976992417


Ahh I see the problem. The steps to get it resolved were not to tell the team about it.

The steps were to raise a big enough fuss that it would undermine customer trust if the team didn't fix it.


twitter support tier is the highest of all!


In the same timespan Microsoft released Windows 1 all the way up to XP


Why have things stagnated?


People put up with it.


actual innovation has just moved to different problems, now that the desktop OS has been mostly "solved" – until SteamOS showed a fresher way forward, that is.


In fairness, the issue was attended to within weeks after it recently got attention.


So, another way of saying this is, it took more than a decade to get attention?


We prefer to call it "eventual consistency".


Realistically, Amazon didn't have the scale/resources to mitigate/manage this problem for customers back in the day. It also wasn't a target like it is now. Even a decade ago, it was a comparatively small problem that was no doubt simpler to address this on a case-by-case basis.

Being responsive isn't about having infinite resources. It's about prioritization. I doubt at the time this was anywhere near the top of the list for them to fix even for the person who tweeted years ago.


something something customer obsession


You made more or less the exact same comment on a recent thread. What does it add to this conversation? https://news.ycombinator.com/item?id=40221193


I did, but I think it's worth stressing that this didn't actually have the quick two week turnaround you might assume if you first heard about it from the billing horror story posted on here recently. It's been known about forever and only became a priority when it turned into a PR issue.


Adds historical context as to a duration of an extremely long-lasting problem?


As an HN reader it added valuable context for me that I was unaware of.


It gave me pleasure.


To be fair it is quite remarkable customer service story & relevant to the article.


We've done it. Now let's re-engineer our apps to use error codes for 200 responses and get free S3 usage.


I worked on a team with similar cost optimisation gurus... They abused HTTP code conventions and somehow managed to wedge in two REST frameworks into the Django app that at one point had 1m+ users...


Another fan of parasitic computing - https://en.wikipedia.org/wiki/Parasitic_computing


I don't know if this counts, but I had something that I would call parasitic happen to me once.

I administered a VBulletin forum, and naturally, we installed all sorts of gewgaws onto it, including an "arcade" where people could play games, share high scores, etc.

This arcade, somehow, came with its own built-in comment system, one where users could somehow register without registering for a proper VBulletin user account on our instance, and thus without admins being notified.

One day, we discovered this whole underbelly community that had apparently been thriving under our metaphorical floorboards, and promptly evicted them. In hindsight, I probably should have found some way to let them stick around, but recently several things had happened that hardened our stance to any sort of un-wanted users.


Relevant xkcd: https://xkcd.com/1305/


If I understand TFA, you'd need to find a way to get S3 (which offers no server-side script execution, only basic file delivery) to emit an error code (403 specifically) alongside a response of useful data. Good luck...


Simple. Just encode all of your app's data and logic as a massive lookup table, each bit of which is represented by an object that either doesn't exist (a zero) or is unauthorized to access (a one).

When you read a sequential series of keys (404 403 403 404 = 0110) it will either tell you the data you were looking for or the next key name to begin reading from.


You can also perfectly parallellize those requests, making the operation highly efficient!


It said "never incur request or bandwidth charges". I assume this means you don't pay to compute the response or for the bandwidth to deliver it.

Seems you could compute the response, store it somewhere (memcached or something), and then return an error. Then have the caller make another call to retrieve the response. (To associate the two requests, have the caller generate a UUID and pass it on both calls.)

That doesn't make it entirely free, but it reduces the compute cost of every request to just reading from a cache.

(This does sound like a good way to get banned or something else nasty, so it's definitely not a recommendation.)


Well, you can probably send out one bit a time by updating your ACLs on a clock (with which your clients are also roughly synchronized) and distinguishing between 403 and 404.

take an awful lot of time to get that data out, though.


It seems to me you could just use static ACLs and create (or not) object names to cause this 403 vs 404 distinction? The drawback is that you'll be paying for the minimum retention of minimum-sized objects, not to mention all the other bucket management traffic you are using.

So you're going to have a lot of consumers of the same bit stream before you've somehow made the covert, "free" egress a net positive value versus a regular object. I imagine AWS can trivially put in place some throttling of error responses to make this impractical.

Ignoring these economic issues, imagine a content-addressing scheme like /stream-identifier/bitnumber which you can then poll to fetch one bit per request. Populate an object (which will return 403) for 1 bits and omit an object (which will return 404) for 0 bits.

You also need to know some stream length or "end of stream" limit. Otherwise you can't tell if you've read past the end or are really fetching 0 bits of a longer stream.

One strategy might be to use an 8b/10b encoding so you can detect when you're not getting a valid symbol anymore. You could treat that as end of stream if it is supposed to be static, or go into some polling mode to wait for more symbols to be posted.

Hybrid strategies might use regular objects or recursive use of these streams to publish metadata streams that tell you about the available stream names, lengths, and encoding schemes.


> take an awful lot of time to get that data out, though.

That’s what glacier is for!


> For buckets configured with website hosting, applicable request and other charges will still apply when S3 returns a custom error document or for custom redirects.

I was wondering about that one.


From the previous story, "S3 requests without a specified region default to us-east-1 and are redirected as needed. And the bucket’s owner pays extra for that redirected request."

So will Amazon continue charge for the redirected 403?


Can't imagine a change like this would be made without some analysis.. would love an internal view into a decision like this, I wonder if they already have log data to compute financial loss from the change, or if they have sampling instrumentation fancy enough to write/deploy custom reports like this quickly.

In any case 2 weeks seems like an impressive turnaround for such a large service, unless they'd been internally preparing to acknowledge the problem for longer


> 2 weeks seems like an impressive turnaround for such a large service

I assume they were lucky in that whatever system counts billable requests also has access to the response code, and therefore it's pretty easy to just say "if response == 403: return 0".

The fact that is the case suggests they may do the work to fulfill the request before knowing the response code and doing billing, so there might be some loophole to get them to do lots of useful work for free...


> do lots of useful work for free

Have often wondered about this in terms of some of their control plane APIs, a read-only IAM key used as part of C&C infrastructure for a botnet might be interesting, you get DNS/ClientHello signature to a legitimate and reputable service for free, while stuffing "DDoS this blog" e.g. in some tags of a free resource. Even better if the AWS account belonged to someone else.

But certainly, ability to serve an unlimited URL space from an account with only positive hits being billed seems ripe for abuse. Would guess there's already some ticket for a "top 404ers" internal report or similar


Metering feeds into billing and they are some truly epic levels of data volume. You can kind of see the granularity they're working with if you turn on cloud trail.


> Can't imagine a change like this would be made without some analysis.. would love an internal view into a decision like this

Sure, here you go: There was some buzz and negative press so it got picked up by the social media managers who forwarded it to executive escalations who loops in legal. Legal realizes that what they are doing is borderline fraud and sends it to the VP that oversees billing as a P0. It then gets handed down to a senior director who is responsible for fixing it within a week. Comms gets looped in to soft announce it.

At no point does anyone look at log data or give a shit about any instrumentation. It is a business decision to limit liability to a lawsuit or BCP investigation. As a publicly traded company it is also extremely risky for them to book revenue that comes from fraudulent billing.


As someone that works at AWS (but not on S3), that's wrong in like eight different ways.

But the only way that matters is the core one - analysis, data, and instrumentation.

AWS does not make these kinds of decisions without a look at the metrics.


As someone who has been involved in high level crisis management issues like this multiple times across various companies I can tell you that in a competent organization it looks nothing like your day-to-day decision making as an engineer or PM. Better yet, as few "rank and file" employees are involved as possible to avoid dangerous situations like you just described.

I don't want to debate the merits of what happened, but a prosecutor is going to open with "AWS billed people for things they never asked for or consented to." You're already fighting an uphill battle that it is not fraud.

Now what is going to save you is intent. If your defense is "yeah we identified the problem and corrected it" you're good to go. If on the other hand, someone decides to run a fucking metrics report of how much you could lose by stopping doing fraud and god forbid it is ever seen or mentioned in front of anyone in the decision making path - you now have to deal with mens rea.

If you have material knowledge that someone took "a look at the metrics", shoot me an email. I can help put you in touch with programs that offer financial rewards for whistleblowers.


I find this hard to believe because this issue was known for years.


It only just became viral in social media.

This is how I found out about it last week: https://youtu.be/OWggTcVgiNg?si=RnxDq1y6-yr_SQ8L


Are you for real? Legitimately baffled by your comment.

How about the financial losses of customers that could be DDoS-ed into bankruptcy through no fault of their own? Keeping S3 bucket names secret is not always easy.


I prefer your version: Barr replies to a tweet before gatecrashing the next S3 planning session. "A customer is hurting, folks!". The call immediately falls silent with only occasional gasps heard from stunned engineers, and the gentle weeping of a PM. I wonder if Amazon offers free therapy following an incident like this


Not billing you because a script kiddie ran a script on your S3 bucket is a good start of a therapy, I'd say. :)


I was thinking this too. You're giving AWS a lot of credit if you think they're not going to do some kind of analysis about how much they were making (albeit illegitimately) from invalid responses. I'm just surprised that they either didn't do the analysis beforehand or that if they did do the analysis beforehand (like the parent commenter suspected), how they were able to get the report for that analysis out so quickly.


There needs to be a law that says any user needs to set any limit on any service or subscription, and then the costs can not surpass this until the budget is upped by the user. At the same time, there should be real-time cost analysis, breakdown per service and predicted costs per day.


A law in which country?

Well, GDPR showed a bit that rather global impact is possible.

If you offer an open service on the internet you need to be prepared that users and misusers will cause costs.

However, if you block it for public access you as a customer are not offering a public service. It's the cloud provider offering a public service so it seems just a basic legal principle that it's the cloud provider who pays for misuse (attempts to access something that is not public). But of course big corporations are not known for fair contracts respecting legitimate interest of the customer before legal action is on the horizon. I wonder what made AWS wake up here.


Agreed. Don’t know what made them wake up, but I did file a complaint about their free tier dark patterns and the Luxembourg EU GDPR office got involved after my countries GDPR office tried it first, and apparently is busy with some bigger investigation, so that investigation might’ve spooked them (not my own application I don’t think)


Now please do this for NXDOMAIN on Route53. This can be a big problem with acquired domains.


I just searched for this and this documentation entry came up:

https://docs.aws.amazon.com/whitepapers/latest/aws-best-prac...

I can't believe that their 'fix' is to set a wildcard dns entry, this feels somewhat like a joke.

Does this mean that a NXDOMAIN response costs more than a successful response?


It has the same cost as a successful response which can quickly add up to a few hundred dollars per month with a couple of DNS enumeration scans.

Google Cloud and Azure also bill DNS like this. Unless you need some of the advanced features you really shouldn't host your DNS in the big cloud providers.


Cloudflare don't bill like this, part of why I moved off route 53.


Neither does Digital Ocean, for that matter.


That’s not entirely true - aliases to AWS resources are free (their suggested “workaround”).


You should never actually use Route53 for your domains. Delegate a subdomain like cloud.yourcompany.net to R53 and use that.


Why?

You're not screaming on twitter about it so it will never happen...

That's not analogous.


It’s still a huge problem for people that have purchased domains. I bought one that apparently used to be a BT tracker, and gets on the order of several hundred NXDOMAIN requests per second.

I understand it’s still hitting Route53 infrastructure, but I’m not using it, and it’s not commonplace to charge for NXDOMAIN records. Because of this, I’m unable to host at AWS (prohibitively expensive for my use-case).

It’s worth mentioning that DNS infrastructure for things like this are very cheap (I used to self-host the DNS infrastructure for this domain for ~$2.5/mo), so the up charge is even worse that what AWS is charging for bandwidth. If they brought it in line with actual costs, I wouldn’t have as much of a problem.


A book could be written about AWS "overcharged" services


Totally agree with this. In their defense (not that I like it), obviously the market is willing to pay what they charge. It’s unfortunate that the other big cloud providers haven’t driven prices down that much.


we canceled them successfully ig


Bezos loss-leader product-manager pushes hook deeper into worm.

I fail to see this as progress, YMMV =3




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: