Those blog posts are a good start. I was referencing the full pricing page for s3 that breaks down by region and class. AFAIK there are some third parties that track the granular data, but it's not preserved anywhere on the AWS page. A lot of price decreases also might have happened silently (without a blog post) or as de facto decreases (widely but privately negotiated, e.g. "sticker price").
> I was referencing the full pricing page for s3 that breaks down by region and class.
You were referring to the announcements and these announcements are still available.
The actual pricing pages of course only show current prices. What would be the motivation for AWS to show outdated prices there as well? No other company I can think of does that and given the complexity of the pricing structure for AWS services that'd only confuse users even more.
> A lot of price decreases also might have happened silently (without a blog post)
Do you have an example for that or is this just hearsay?
> or as de facto decreases (widely but privately negotiated, e.g. "sticker price")
Privately negotiated deals aren't regular price reductions as they're only available for customers with a fairly large spend for such AWS services.
I can say that price _increases_ happened for a few AWS products in specific regions/countries without blog posts. Often AWS simply emails affected customers 1-3 months in advance and after making the change. That said, I imagine prices shifted due to currency, labour costs, energy costs or other regional concerns. And I’ve seen the opposite - I’ve seen AWS freeze prices and eat the cost differences long after they should have raised costs. It varies by product and region and marketing strategy, I expect.
The only rule of thumb is what is posted on the pricing page is supposed to be what you get charged. No APIs exist for the most part, and the actual charges can sometimes differ anyway (e.g. grandfathered pricing etc.)
If that's his core argument then it undermines the entire blog post right? Not to mention it calls into question other information he's provided .All AWS pricing is available publicly , saying the prices are "held very secretly" and being obviously wrong isn't a good look.
> as de facto decreases (widely but privately negotiated, e.g. "sticker price").
Widely, AWS doesn't negotiate on pricing. Small business get the list price or they can switch to another cloud. Medium and large businesses (from 100 employees to Fortune 500) get blanket discounts (excl. some specific line items) in exchange for a spend commitment, e.g. if you commit to at least $10M AWS spend per year, you'll get a 10% discount for all your spend. It doesn't give different discounts to S3 compared to other AWS products.
Do unique top-top-tier customers of AWS get special S3 pricing? Nice for them, but it isn't _widely_.
I don‘t know about S3 specifically but with a large enough consumption you can negotiate a specific rate card for individual AWS services in addition your Enterprise discount.
FWIW $10M is at least 1 order of magnitude, and in some cases 2 orders of magnitude higher than necessary in order to negotiate discounts, depending on services used.
I think at one company I worked, our discount specifically excluded data transfer. It is X% off for all AWS services not including their data transfer costs (both inter-AZ and extra-VPC). It makes sense to me that AWS, with their networking-heavy pricing, negotiates on services and data transfer separately.
I expect pricing is probably pretty bespoke, given there are humans involved. I worked for a company that had X% off all AWS costs, including all data transfer - except for CloudFront, which had a whole separate negotiated rate card (including data transfer, per-request rates, etc.).
AWS's pricing strategy changed a few years ago. Previously, they'd drop prices aggressively with much fanfare every time they did. Nowadays, they almost never drop prices, but release new offerings they say will save money [for certain use cases]:
- Gravitron 2 is slightly cheaper than Intel instances, with a greater price/performance ratio—at least for some applications.
- m6i is same price as m5, but new generation of CPU and increased network performance.
- gp3 is 20% cheaper than gp2 and has better baseline performance. It can also scale performance independently of size, so no more overprovisioning storage to hit a certain IOPS.
- S3 has IA, Glacier, Glacier Deep Archive, all offering cheaper storage but more expensive retrieval.
Not defending AWS here, just noting that they don't seem to be interested in direct price reductions anymore.
(EDIT: Previously I stated gp3 and gp2 were the same price. Thanks zerocrates and maxxam for the correction.)
This is how every market player is incentivized to behave, apart from regulatory authorities acting to prosecute monopolistic behavior.
1. Price and compete aggressively to grow market share, put competitors out of business, and even transform the market, if possible
2. Once you become the dominant player (i.e. a monopoly), raise prices (or don't lower them) as much as possible without damaging your market position
This is why it's so important, especially in the US, for government to begin aggressively enforcing antitrust law again -- something we've failed to do against companies founded since the 1980s.
Those laws are there for a reason - not just as a weapon against "evil" (though Standard Oil was admittedly pretty bad) but to keep the market healthy and growing. There are so many positive second- and third-order effects that occur only when competition is healthy and well-regulated, and when prices are transparent.
> This is why it's so important, especially in the US, for government to begin aggressively enforcing antitrust law again
It already does and it didn’t particularly stop. The problem is that there is no obvious damage to consumers here (what US antitrust law is based on).
GCP and Azure are both significant players with similar offerings. The fact that people use proprietary Amazon APIs to manage stuff isn’t a high enough bar to show a monopoly.
“We are locked into their product because it’s a big engineering expense to move off” isn’t an argument for monopoly busting. It’s a reflection on poor business decisions by the complainer. It has never worked against Oracle/MS in the past, it won’t start now.
A whole generation of engineers is about to relearn the importance of open source that drove everyone to open source stacks 15 years ago.
> A whole generation of engineers is about to relearn the importance of open source
it's not really about open-source, but about inter-operability and open protocols.
Imagine if AOL internet was the defacto standard, and every website is their own walled garden? Oh wait, no we already have that - it's the mobile app ecosystem!
The reason the web is so successful (but not monopolistic) is that the http protocol is open, and the HTML standards are open (at least, until google started meddling now since they are almost a browser monopoly...)
So laws for anti-trust should now take that into account - platform monopolies can be beaten by forcing interoperability via legislation.
To the software wide of the equation --> open source is NOT the answer, abstractions ARE. The problem with programmers TODAY is that they take technical article written by a a cloud provider as gospel, and start slathering large layers of priority api calls/technical debt all over their solution - because thats what example code is designed todo, LOCK YOU IN. No need for contract nasties when your devs are doing the work for them!
Use an eggshell architecture, put your dependencies on the edge. global dependencies are the enermy.
The problem with the above narrative in this case (which I agree is generally true), is that's not what AWS did. If you look at the announcement history shared in a sibling comment you'll see that so much of it happened long before the cloud wars really heated up. I find that particularly curious. Is that because margins used to be very very fat and AWS just trimmed them down as economies of scale allowed in an attempt to (unsuccessfully) stave off competition? Is it just coincidence that the system found a natural level of margin efficiency right around when competitive pressures started to ramp up? Something else?
I have zero insight on why it has played out this way. I'd love to know though.
Seems likely that the competitors they were trying to dig into were on-prem/leased machine incumbents more than the nascent cloud competitors, though? Trying to build the brand of "Cloud" as an alternative, to the point where now it's practically the default choice for big enterprise/government customers. They combined that with extremely aggressive credit packages to get startups while they were young, so they would adopt the various services rather than building out their own infra on bare metal and saving a bundle long term.
Though making it harder for cloud competitors was certainly good for them as well.
It's like with any startup, your biggest competitor is what people do currently, not necessarily another company, let alone another startup. For many software startups, the biggest competitor is something like pen and paper, excel, standard email, etc.
Again I want to be clear that I have zero special insight on this based on my previous employment...
I was at a startup that was an early and very large user of AWS. The alternatives for us at the time would have been companies like Rackspace and other innovative (at the time) colo type providers. AWS wasn't super competitive one a $/compute basis at that point, and the credits didn't last very long, but it was wwwaaayyyyy more flexible. Add in the incredible convenience of S3 relative to most other alternatives at the time and it was an easy, though not obviously cheap, option.
The common narrative I hear though is it was the startup focus that won it for AWS. Everyone else was chasing the on-prem and enterprise market as you said. AWS went have startups, dangled some modest credits to make it happen, and they stuck around. The conventional wisdom was this is a terrible mistake. Enterprises pay the bills, startups go bust in an economic downturn (and we're coming out of a cycle at this point so companies are understandably nervous). Except those startups that AWS attracted turned into Netflix, Airbnb, Uber, Lyft, etc. and whole host of voracious consumers of infrastructure. The startups had become the enterprises. The competitors were _still_ trying to convince enterprises that cloud was safe enough to adopt. They belatedly realized they'd played the wrong game, tempted some of the not-so-startup-anymore companies across with more competitive pricing commitments, and finally the battle began. By this stage AWS had won most of the viable early adopters and used that as the beachhead to grow into the big enterprise and gov areas.
At least that's the narrative I've been told a few times over the years, and it seems plausible and maps onto my own experience. Though all of that experience has been startups and not enterprise/gov so it's a very skewed perspective.
But AWS invented IaaS, started at 100% market share, lost market share for years, and is now below 40%. Egress pricing is anti-competitive but that's a different narrative.
I suspect we'll arrive at a place as a society where regulators closely monitor use of protocols and require companies to fully support open protocols and network APIs.
How much have they had to pay in penalties, and how many significant structural changes have there been? That's the standard to use, not number of lawsuits or even number of verdicts.
Antitrust penalties, more often than not, aren't even as big as the extra profit a company made!
There should be results. Tech giants are too big to fail, as are the big banks and insurers. If we break up the banks and insurers by product offerings, then why not break up big tech?
Each company gets split into 3 smaller firms, each with access to all the IP of the mothership. They get to fight for their customers, and barter over business assets, to figure out who owns what.
I find it so highly ironic that anyone truly believes this would end up better.
They would not fight over customers, in the same way the breakup of AT&T did not result in a fight over customers - only a more complete, nimble, and effective domination of the entire US that lasts to this day.
These companies are hamstrung by their size and internal cultural infighting - you will break them into much more effective units than they can culturally achieve themselves.
The typical trotted out example of Microsoft is an aberration - Microsoft did not believe anything bad would ever happen, was horribly defiant, and refused to prepare. It still bounced back anyway, to a point of serious domination again.
Meanwhile, all of these current tech companies are prepared for this eventuality, as AT&T did.
I'm unconvinced that breaking them up is a good thing or not.
But for me the biggest "for" argument isn't that the broken up companies will compete with each other (that makes no sense because you'd break them up along business lines where they don't compete anyway).
Instead for me it is that it removes cross subsidies both financial (as in "we can give away this loss making application because we make so much money from X) and marketing (our new product Y is inferior to the existing product Z but we can make Y the default in our other apps and then people will just use it).
This removal of cross subsidies does increase competition.
However it's possible to force the removal of cross subsidies by other means (eg, the old "force the user to be able to select the search engine in Internet Explorer" regulation etc).
Yeah, you could kind of see this when Google started making Maps pay its own way, and started charging seriously for Maps API access - suddenly a host of decent alternatives sprang up, because they stopped sucking all of the oxygen out of the market.
Acting on splitting up the monopolies. Making Amazon/Google/Meta into multiple smaller companies. As antitrust fines have now just become the exepences one pay to run a monopoly now.
I seemed to remember gp3 being (incrementally) cheaper than gp2... yeah, gp3 is 8 cents per GB-month vs 10 for gp2 in us-east-1. That's actually a reasonably significant price difference, now that I look at it.
There were use cases with gp2 where one would over provision capacity to get IOPS. gp3 let you just dial up IOPS which was much cheaper than over provisioning. This may seem a corner case, but it was moderately common.
Also AWS should cut their prices, well in this environment maybe holding them is a cut....
It’s noteworthy to me that available memory has not increased in some time. You can find machines with more memory, and perhaps the number of classes has increased, but outside of very narrow envelopes it might be cheaper to have 2 m6i’s with undersubscribed CPUs than to try to increase the amount of cpu per core on something else.
I don’t know what data center u6tb-1 is in but it’s not in the ones I use.
X2i is interesting, but may overshoot the mark for the service I’m thinking of, but might be appropriate for another. I may have to try again as to whether I can make the r6i math work.
Storage price reductions slowing down or even coming to a halt makes me wonder what this means for "infinite storage" companies.
I'm talking services like Facebook, Youtube, that are "free" (ad supported) where on a daily basis an absurd amount of new content is added, yet almost nothing ever removed.
If storage needs grow endlessly yet storage costs stopped going down, wouldn't that mean that the model in the long term is unsustainable? Sure you can delay the inevitable (compress content, move old stuff to cold storage) but ultimately storage costs per user goes up whilst income likely does not.
I disagree, I’m still in the datacenter storage business and our service delivery costs have been going down. You have to look at the service as a whole… nvme let us retire frames, etc.
I think hyperscale margins are high, and I think between the many tiers of storage and backend technology they are constantly cooking cost out.
The highest speed storage is cheaper than ever and getting cheaper. That’s the expensive stuff. Our “big” high speed storage array was like $5M in 2012. Now it’s like 10x bigger, 5x faster, and 40% of the cost.
Bulk spinning disk prices drop at a slower rate but historically they are performance bound. The marginal byte of cold data has no marginal cost, as you always have lots of space on disks where you’re short on IO budget.
Really cold workloads that need to live on-prem for reasons and need to be retained for 30+ years can go even cheaper. That stuff lives on tape at a lower price point, like a half penny per raw gig.
> Equally possible is that amazon's S3 margins are getting better.
Which is acceptable if they're providing better value in features. From casually looking around it seems that Backblaze's B2 is much cheaper, but doesn't have a good story around reliability, namely uptime guarantees and simple multi-region redundancy. If Backblaze could match AWS in this regard, and with their Bandwidth Alliance with Cloudflare, they could provide better downward pressure on pricing.
I'm not worried about the ad supported free storage providers. On YouTube the ratio is something like 10,000 views for every 1 upload, so they amortize that storage cost across a lot of other users, not just the person who uploads it. Ratio will be lower on Facebook but even active users probably don't average more than 1 or 2 uploads per day. I've had the same gmail address for over 10 years, never delete anything, and I'm still under 50% of their free storage limit.
Once ancient content no longer produces enough ad revenue for the companies storing it, I suspect they will support it only in paid tiers for users who really want it. Something like this is already happening in Google's paid tiers: each user is valuable as part of Google's advertising audience, but the economics no longer make sense once those users keep many GB on Google's servers.
This already happens at YT. It's pretty common knowledge that after 300 views, the handling of a video changes a lot, including moving from a cheap storage medium to the media used for popular content.
Presages an interesting digital future in which nothing ever truly goes away, it just gets slower and slower. An endless inner migration that never quite hits the stopping point, and which could theoretically be reversed if only the slowed content garnered enough attention.
I'd expect a significantly different scale. Specifically, I'd expect normal tiers ranging from "instant" to "several seconds", then a huge gap, then a rock bottom tape archive.
From a more zoomed out look, you could simplify it to only two speeds. The fast speed taking 0-15 seconds, and the slow speed taking minutes to hours. Any content that's been accessed once or twice in the last week or month would be on the normal tiers. Extremely dead content could fall to the tape tier, but it has nowhere further to fall, and it would take only a tiny amount of activity to rescue it.
I don't really see a reason for there to be a continuous falloff in speed. There's not really anything between hard drives and tape for responsiveness, either existing or proposed, that I'm aware of. Nor is there anything slower than tapes.
I suppose that's what it would look like currently, but as content increases and the aggregate long tail of unpopular material grows ever larger there could be shifts in desirable storage solution characteristics that fit different economic niches. An extremely dense, extremely cheap, and extremely slow WORM storage device could find a place somewhere in the future. Cheap as in order(s) of magnitude less.
The next immediate step from online tapes today could be offline tapes with online indexes & robotic retrieval systems. These exist today. The continuous falloff would be a matter of priority ranking given to content requests-- not merely FIFO-- so ever less popular irrelevant content gets shoved further back in the robotic retrieval queue. A recently iced bit of content might be top priority for the tape loader while something not touched in years might sit hours down the queue. The continuous decline isn't defined by the storage media but instead by the capacity of the retrieval systems. Speed would continue on a slow decline as content increases even more and the low economic value of that content make investing in increased capacity impractical.
Eventually you get to a point in some far off future where the retrieval time for some obscure bucket of bits is measured in significant fractions of a human lifetime, where a dying grandfather requests a video of his wedding 70 years earlier only to have it arrive just in time for his own grandson's dying moments decades later.
I think I've gone too far imagining unlikely slow storage dystopian futures though, so I'm going to stop now before I start ranting about the Slow God who needs only enough access requests from the masses of his adherents to prioritize his retrieval from the depths of cold storage. But Dante Alighieri warned of what was stored in the coldest depths and it was no god... Oh God what hath this comment awakened?!?!
I don't really see a reason to prioritize content that hasn't been accessed in 2 months over content that hasn't been accessed in 200 months, when picking what tape to get next. Either way there is only one person waiting.
And I'm already assuming the tapes are offline, because online tapes would just be a waste of money.
Another issue is finding enough content suitable for very high latency systems. Right now they seem to basically just be for backups.
Tape and cheap disk are about the same price per GB, but tape is more stable over long periods and doesn't have to be powered (although power for very large, slow and rarely accessed disk is low).
> There's not really anything between hard drives and tape for responsiveness, either existing or proposed, that I'm aware of. Nor is there anything slower than tapes.
It makes me wonder what a storage device would look like that is cheaper than HDD, similar or better storage density, and allows random access, with a trade-off for slower speed?
They cost more than a hard drive and are less dense. And those "archival disc" cartridges Sony makes are even more expensive, with drives that cost more than tape drives.
That content has extreme temporal properties. An image that's uploaded is increasingly less likely to be accessed over time. You can probably store it on very cheap hardware with pretty serious compression.
Not to mention that it's mostly text and images that get uploaded to these sites. Videos are the worst case. Let's say there are 1 billion facebook accounts, each one uploads 1GB of data (seems huge), and let's assume that compression just cancels out replication.
That's 1,000 petabytes. On S3 that's in the low 6 figures per year, obviously ignoring the exfil/access costs.
That's not that much. Obviously you want to keep some of it "hot" - profile pictures, recently uploaded pictures, etc. Hand waving, assuming 1% of the data needs to stay hot (seems high), that's 10PB of data. Certainly you're in "big data" land but it's not like there aren't databases that'll handle it.
I've noticed pictures in Facebook Messenger getting compressed after three or six months. I'm not sure if this is new or if it's just more aggressive now.
I'm certain that in the ToS they reserve the right to remove any video at any time for any reason. Dumping your family videos on a public video sharing site that you don't pay for and expecting them to exist there forever for free was never a good idea.
They’ll probably migrate those people to a “Google Photos” or “Google Drive” storage option so they pay for their usage but still have archive and sharing options.
How does that compare to today’s data though, given larger numbers of higher-bitrate higher-resolution videos?
Perhaps during the first year of youtube’s existence people uploaded an exabyte of video… but if people today are uploading one exabyte per day, then it hardly seems worth the hassle of deleting that first year.
I suspect (but have no data for) that a significant chunk of content on social media is memes and still-image-video. Those are likely to be easy to dedupe or compress aggressively
storage price continues to decrease. the point is that S3 price doesn't drop at close to the same rate. facebook and youtube aren't hosting on S3, so they get to experience the real decreases in storage cost.
The main issue we have with S3 is the extortionary egress bandwidth fee. Storage pricing seems OK, but what's the point if I can't send those files to users?
The margin on bandwidth is enormous. I think Cloudflare did some research into it a while back. But there are rivals providing cheaper S3-compatible storage at scale. Cloudflare's R2 ($15/TB, and no fees for egress). Oracle Cloud Storage ($25/TB, and first 10TB of egress is free). There's Wasabi ($5.99/TB, and no fees for egress or API requests). To name but three. Of course each of those have drawbacks e.g Wasabi is intended for long-term storage, R2 is still in beta etc. But there are options for new data.
I didn't realize Cloudflare R2 was available yet. Took a look based on your mention... "currently in beta". But you can still just sign up and use it already?
Anyone here done that and want to report back?
Ah, I found this posted by it's author elsewhere in these threads:
Very useful. Wait.... "no public access" AND "no pre-signed URLs"? Am I misunderstanding what that means? That would seem to make this not use cases of serving files to end-users? or what am I missing?
I haven't used it, but my understanding from reading about it is that the intended use-case is to create a Worker (roughly equivalent to an AWS Lambda) that mediates access to the bucket. So you could have the worker provide public access, or have the worker do some kind of authentication — you just have to roll that part yourself.
I haven't used it either but it certainly sounds like it's open to anyone to use. As @chc mentions, it reads like you use a Worker to control access. So you can make your files public by simply not requiring any authentication. I've just been scrolling through their getting started guide. It looks promising: https://developers.cloudflare.com/r2/get-started
Although one of my use cases is video files (fairly small non-profit usage), which I have understood are not allowed by CloudFlare CDN terms of service (at least at non-"enterprise" tiers?)... it's been confusing to me understanding if I could, for example, serve video files from Backblaze via CloudFlare CDN and Bandwidth Alliance; or with R2, if there's a way to serve video files from R2 to the public that is allowed by tos.
Yes you can serve whatever you want from R2 directly. From [1]:
> The Cloudflare Developer Platform consists of the following Services: (i) Cloudflare Workers, a Service that permits developers to deploy and run encapsulated versions of their proprietary software source code (each a “Workers Script”) on Cloudflare’s edge servers; (ii) Cloudflare Pages, a JAMstack platform for frontend developers to collaborate and deploy websites; and (iii) Workers KV, Durable Objects, and R2, storage offerings used to serve HTML and non-HTML content.
Now it’s important to note that the Cache product does not fall into these supplemental terms (even if you use the Workers API to access it). So if you are Caching the video files you’d potentially run into problems (but that would also be true of serving video content from Blackblaze that you were caching).
Wasabi's egress is not fully free as you make it sound. Their egress works like this: if you store 1 TB of data, you have 1 TB of free egress, as seen in their FAQ [0]. This means it's unfit for certain cases and another competitor could be a better option.
You are correct, that would be another example of one of _its_ drawbacks. They certainly promote "No Fees for Egress" and "No egress charges" (home page). But yes, whether that actually is the case depends on your usage.
A bit tangential, but in some cases you can use S3 Gateway endpoints, which can dramatically reduce your egress costs if you're just accessing it within a VPC.
Have you looked into other S3 compatible services?
We switched to DO Spaces because of the lower bandwidth and storage fees. The savings was actually quite a bit and no noticeable differences for our use case.
I know there are others services out there that are also s3 compatible and cheaper.
For my use case, DO Spaces was quite bad in terms of artifacts being served from stale caches after being replaced and the 'purge' API being called on the bucket; this was over 2 years ago so maybe it's better now.
This is the other issue with using another cloud, S3 is extremely well documented.
Some other cloud providers give no statement on reliability/availability/consistency. And worse some providers give statements that violates the CAP theorem.
The big clouds are some what reasonably documented, but many smaller vendors leave you guessing, or promise what I know they can't keep.
I think this would be an excellent area for regulation since it anti-competitive (portability is dramatically reduced since it has been made artifically expensive to move the data to another cloud service) and the cloud services haven't shown any interest in doing something about it on their own.
Regulation for bringing down prices for unnecessary cloud services like hosting a file?
Let's focus on things users cannot change. Using the cloud to host files is an easy and expensive way to store files that should be a luxury or a tax on the foolish.
Are you seriously suggesting that S3 is a monopoly with means to block competition. I mean even when discussing cloud in general, what you are suggesting is a joke.
I've used it to store large datasets that are processed within a region, a backup system, logs and metrics, and as an origin for CloudFront. I think you're referring to using to host a consumer download service because of web objects in a serious business you'll need a CDN.
All S3 usage I have encountered in the wild has been user submitted files. Stuff like profile pics, document attachments, etc. You could in theory store these on the VMs storage but that encounters the frequent problem that the storage fills up. While S3 is unlimited and easy to integrate.
Hard disks have been getting larger capacity but i/o capability is fairly flat. The result is that price/GB is dropping but price/iops is not. So _cold_ storage is where you would expect to see pricing fall. I don't follows AWS pricing closely but I've seen a lot of news around Glacier over the year so that might reflect this fact.
The capacity and speed curves of drives is quite different. When the first TB drives were released about 15 years ago, the fastest ones were about 100MB/s. Now you can get 20TB drives (20x capacity) and the fastest ones are about 300MB/s (3x speed). It is just far easier and cheaper to make a drive twice as big than it is to make it twice as fast.
SSDs are a different animal, but have some similar characteristics. Within the same generation (e.g. m.2 pcie gen 4), you can get drives that have a lot more capacity but have roughly the same access speeds (i.e. the 2TB version is very similar to the 1TB version). The speed increases between generations is much better than with HDD. Pcie gen 3 drives seemed to max out about 3500MB/s while the gen 4 drives are about double that. I have seen reports that gen 5 drives might double it again
With HAMR technology we might get HDD drives with capacities in the 50TB-100TB range. You can bet that the speed won't be 5x current technology even if they get dual-actuators in them. There will need to be some kind of breakthrough technology to improve it significantly.
This is why we need better data management systems. If the meta-data (e.g. file table) is only 1% of the data that is still a lot of data to read in and store in RAM. We need better systems where the file records are much smaller.
https://didgets.substack.com/p/where-did-i-put-that-file
There's now a new Glacier product, called Glacier Instant Retrieval, which supposedly makes it cheaper and faster to retrieve from Glacier, if you do it a limited number of times.
The R2 vs S3 story is pretty interesting. For archival use cases, S3 still wins by a mile, but for running apps, R2 often wins (minus lacking features).
For my company, neither egress nor storage cost are the big issue. It’s the API call (PUT) cost.
We deal with payloads that are just a little too big for a database (we run Postgres and Clickhouse) but just too frequent (~100 per second) and small (think largish json blobs) to be effective on S3.
We are write heavy. Reads are probably 1% but need to be instant for a good UI and API experience.
What I have seen done before is concatenating many small blobs into a single large blob that is stored on S3. This works great for batch processing afterwards.
If you need read access to the objects one option is merge them into a large blob, and then create a small index file that keeps offsets for each of the tiny blobs. Then you fetch the index file, find the offset of the tiny blob you want and, do a range request for this offset into large blob.
This mostly works when you're not read heavy. I recently did an index file for serving HTML files out of a tarball. As an alternative to uploading many small files.
Have you looked at Kinesis Firehose? It was pretty much build for this use case although you will still need to see if you can define a partitioning scheme probably in combination with an S3 Select query to meet your query requirements.
We are using Kinesis. It’s fine. Great actually. We still need to store user logs and generated data persistently. Cold storage is also not an option. This is data that needs to be accessible the moment the event that generates the data happened. Don’t want to push my product too much but I run a synthetic monitoring comopany. Check my bio and you’ll get a gist of the type of workloads.
You can host your own S3 API-compatible object storage service on some EC2 instances (exercise left to the reader to figure out how to make that reliable). Zero PUT cost, higher operational overhead.
Yes, but it does blow up TOAST and has a lot of impact on the deletion behavior on busy tables. We removed all larger json blobs from PG. typical settings or config stuff in json in PG are fine. We use that all the time. But larger json blobs of several kilobytes are still an issue for semi timeseries data.
Could you elaborate on the TOAST issues you're having? We're pretty liberal with our use of large JSONB objects and might hit a billion objects in a year or so.
I cannot speak for the big companies, but for private cloud storage, I suggest to go the route for self-hosting, all the tools are available in 2022: ZFS for reliability, Proxmox for separation of concerns, Opnsense/pfsense for access control, Nextcloud for convenience (if you need such file sync at all). Add a
photovoltaic plant and your electricity bill will be _Ok_ (you should do this anyway).
I have a 40 TB ZFS Z2 Pool consisting of 6x 8TB drives, and a 16TB offsite pool that is booted for backup snapshots weekly. You'll have to replace the 6 drives running 24/7 approximately every 5 years. If a drive costs $200.00, that will be $1200.00 per 5 years, or $20.00 per month. Add about $400.00 (with PV) to $800.00 (without) for electricity per year ($30.00/$60.00 monthly) and $7.00 monthly for UPS batteries. For these $57.00, you will get a full virtualization feature set under your control, not only a 30TB ingress data sink.
With Amazon Glacier, the cheapest "data sink" cloud storage, 30TB would equal $123.00 monthly (or $30.00 with S3 Glacier Deep Archive), with quite a few feature caveats.
If someone is considering the “store at home” route and this makes their head spin then consider just buying a prebuilt nas. You pay for someone doing this for you, but it has similar monthly price in the long run.
It looks like storage costs haven’t changed much since the last S3 price reduction five years ago.
My other take on this is that given how slowly HDD costs are going down at this point, tape is going to remain relevant for some consumers for a lot longer than many of us thought.
As a European I take not getting many times as expensive, like our fuel, energy and food prices. I wonder how the European cloud will fare this winter, and how many European customers we will lose.
> Even with a slowdown of Moore's Law, it seems like AWS has a healthy margin to continue to offer strategic price cuts only when necessary.
Which doesn't surprise me much, really. If your customers are mostly stuck with you, competition is sparse and people pay the price you demand - why would you reduce prices?
B2 isn't great for low latency serving, objects that aren't in hot cache have extremely variable delays on first fetch, and the delay (at least as of a year ago) scales according to object size.
For largeish video (over 500mb) I remember seeing >1 second latency, enough to rule out using it for anything public facing
Blackblaze reliability and performance are below aws, same for bunny cdn. Although I understand it can be interesting for some use cases where perf/reliability is not critical.
AWS "reliability" has been the direct cause of a number of sleepless nights for me over the years. Comparing to a few years ago when I worked on a large-scale product hosted on bare metal servers that worked beautifully, I don't think AWS is all it is hyped up to be.
Anecdotal, I know, but even with no experience using Backblaze or Bunny, the bar they would have to meet is a lot lower than you're implying.
I'm talking about my personal experience, on blackblaze the number of 500 errors was simply not acceptable for my use case, likewise for performance and latency. I was a bit disappointed by bunny cdn rps/latency. But indeed price is not at all comparable.
Also I'm not talking about any aws service but more specifically about S3 a d CloudFront.
Finally, as I said above those blackblaze and bunny are amazing if you try to optimize the cost as your main goal.
Theirs other options available depending on your risk appetite.
For example, I built a file sharing tool (https://www.fileyeet.io/) off the back of Storj (https://www.storj.io/) which is a distributed file storage backed by a crypto coin (maybe one of the few legitimate uses of crypto, although I'm not convinced yet).
Storj was a much cheaper option than S3 although I do have to trust that their systems are as secure as the advertise them to be. Likewise, R2 seems like a good "in-between" option.
Both R2 and Storj share the S3 API for integrating with them.
AWS is great if you have lots of money and little time. Which was very true in the last ~decade, where we had lots of venture funded startups that had very much money, but little time.
Also, while you definitely pay a markup, the standard EC2 pricing also contains instant availability. If you skip on buying a car and instead pay a taxi to wait 24/7, your costs will also be insanely high. Additionally, AWS provides a great ecosystem where your app can easily be managed in - things like getting a https certificate, setting up a redundant load balancer and even a CDN, database or a Kubernetes cluster can be made simply with a few clicks in an UI. If you don't have someone who knows how to configure those services, it can detract a lot from what you're actually trying to do as a business. Lastly, it has all these enterprise features you suddenly need - solid billing, encryption, certificates etc..
Don't get me wrong, it's expensive, but there's a reason so many businesses use AWS.
It's easy to enter the market and offer a cheap product. It's hard to enter the market and offer a very solid product. It's nearly impossible to enter the market and provide the hundreds of services the big players can provide, and operate at the scale they can provide, servicing the number of markets and customers they do, with the level of support they do.
The big players know that what separates the majors from the minors is trust. If you buy from AWS, you know what you get works, and you will pay a premium for that assurance. And also it is really fricking expensive to be AWS.
Very much this - I don't have much experience with providers in this space but Digital Ocean has taken maybe a decade to build and offer a small number of services.
That said, there is a lot of AWS that I probably wouldn't know existed - I know in my day job I make big use of maybe five services, plus maybe another 10 glue services between them (CloudWatch, IAM, VPC etc).
Uploaded about 100 images. Put those images on a page. 50% of them timed out when downloading. Maybe I was just unlucky, but it was a good way to make me instantly lose all confidence DO Spaces. S3 and Backblaze B2 worked fine for the same thing at the same time as when this happened.
Which raises my question: Is there some open source github project that lets you create an S3 API compatible with underlying heterogeneous VPS hosts clustered? I'm guessing even with this storage would still be expensive, so how does Backblaze pull this off?
ah yes minio! the one issue i have with using VPS for production is, are they hardened? or is it enough just with proper unix user management and UFW? I have this fear that the VPS box has attack surfaces or zero day vulnerabilities but when I am on the cloud I do not have this worry.
Perhaps irrational but can't argue with the peace of mind that expensive clouds offer. Although we've seen misconfigured S3 buckets leaking data so.
Large fixed costs, economies of scale, whole product (need compute AND storage). It's possible that a company like Cloudflare can disrupt certain verticals in storage that are mispriced by AWS yet have a larger than expected TAM.
Because infrastructure is not that high of cost for most companies and every dev and devops out there knows AWS and has no clue how competitore dashboards even look like.
Human biology is not programmed to abide conflict of interest. On paper things can be written just so, but what’s on paper does not stop feelings, friendship, and connection between two people from forming given biology. Pointing at some philosophy to equivocate away physical laws is the simple con leveraged against the people.
The big players leverage their understanding of science and well
paid lawyers to play a cognitive game where investment in storage is set aside, as storage is “a solved problem”, they collude to focus government spend on new things they can charge consumers for after charging us via taxes and agency, to build it.
Good luck finding a VC willing to compete against Bezos. They’re not going to target the guy managing the infra risk, providing a cheap platform key to their cheap startup gambling. They’re going to target naive college kids to try and build a rocket for them. Because VCs are smarter than Bezos; do none of the work, own the reward.
I suspect it might be an issue of "cheap enough". For smaller Projects, the s3 storage cost doesn't matter too much. Medium sized projects may find the tradeoff between price vs cost of a custom solution to still be more than good enough. Large enough projects simply chose a different, more specialized solution anyway
There is such a thing. Like if it ain't broke don't fix it, compute costs aren't always that big, sometimes single-thread is enough even on Wall Street exchanges.
And like come on originally the alternative was hiring employees and managers for those employees and dealing with human error and deception overlapping with the guilt of exploiting them, the whole management game. Difficult to find true leadership, and a work-ethic shared between manager and employee. And the education I got in the nineties and noughts was made for desk jobs with stationary, not computers.
Sometimes automation alone is fine. Even on a computer that hasn't been reset in fifty years and is obsolete according to everyone else, hey if it does the trick. Make sure it stays powered. Cobol. Does the trick. Nuclear plants, they use really old software, one using very new software and connected to the internet was hit by the Morris Worm.
I literally picked up a 32 ounce rock on the street that was intended, judging by its shape and way it was cut out of cement and pebble composite, for stoning. Like for Biblical harlots. It was left behind right after a protest was cleared one Friday afternoon on Portugal and Alameda, Santiago Centro, Santiago Chile. Found it Friday like in April, at 22:16. I roam, I checked out the scene--as is my wont--to see what's up, different graffiti on the walls, and then whoops don't see a lot of pint-sized rocks on the sidewalk, somebody might trip, better clear it. Took it home. Realized I should have worn gloves. Next time I went to hang out in front of the police station I told them about it, hey you could do forensics I said, they're like uh no wrong station for forensics, uh...thanks for clearing it away, those are meant for cops.
I got stoned with similar rocks after watching cops on motorcycles retreat from a mob, didn't click that mob was throwing rocks, kind of aiming at whatever that moved, and fuck did I then move, I sprinted away to safety before the gates closed on me. Luckily I didn't take a direct hit, none nailed me.
But the moral of the story is this: nothing about being tens or hundreds or thousands or in this case millions of years old renders it ineffective as a weapon. Same laws of physics. Same gravity. Muscle equal strength. Rock is just as hard now as it was then. Harder than my skull. Death is bad. Bad then, and bad now.
> Yet, AWS S3 pricing hasn't decreased as fast as the underlying storage costs.
> […]
> Another blog post analyzes the same theory for compute and finds a similar story using pricing data from AWS EC2.
AWS used to do frequent price reductions years ago. At a certain point they seem to have stopped doing that and are now only doing them rarely. That's really a shame as there are still a lot of AWS offerings which are priced way too high (data transfer being the most prominent one).
It'll be interesting to see if and up to which point AWS will keep the prices stable with raising inflation.
Besides storage cost, S3 API access cost can also be high if frequently accessed. And latency is unpredictable.
You can use SeaweedFS Remote Object Store Gateway to cache S3 (or any S3 API compatible vendors) to local servers, and access them at local network speed, and asynchronously sync back to S3.
Like the other comments mentioned your post misses of how transparent AWS has been in price reductions in the past.
More importantly, S3 now has several tiers of pricing depending on how frequently you access the data. So maybe lately they haven’t reduced the pricing of the top tier of S3 but they’ve made it significantly cheaper for other use cases of data. That is very contrary to the comments being made of innovators dilemma.
(I used to work at AWS but have no knowledge about pricing decisions)
This is eventually why every cloud migration will eventually get undone: As cloud providers dial up the profit margins, going on prem will be a no-brainer.
If you account for IOPS and not just on Cost / GB. The pricing structure of HDD hasn't changed a bit in years. Especially when S3 ( Blue Spot ) used to be way above HDD Cost Per GB price.
So while I do agree with AWS pricing structure has changed in recent years. I dont see how the Data shown correlate to that conclusion.
I’m sure AWS’s margins are growing on S3, but to be fair, S3 is more than just raw storage. The majority of the cost and value is the API and the ever-growing features it provides. Those aren’t getting cheaper nearly as fast as the underlying storage hardware is.
It looks like the most recent data point, S3 was actually cheaper than the raw storage, while providing 11 9's of durability, across multiple AZs. Still looks like a pretty good deal, as the price of storage has mostly flatlined in this graph.
Nor does this even account for cold storage, or reduced redundancy.
The prices are in different units. S3 is $/GB-Mo, raw storage is in $/GB. So, when they are equal on the graph, you are paying each month to AWS what the cost of the raw disk is. Now, yes, you need a lot more than just raw disk to effectively store data, but even if you just assume a 5-year lifespan for that disk, the price difference displayed on the graph as equality is actually a 60x difference in plain $.
The price has also appeared to flat-line because they are using a linear graph scale for logarithmic data. It should use a log-scale for the y-axis.
Now factor in redundancy and server costs, and the difference is not so huge. You're still paying a multiple of the raw cost, but unless you're storing on the exabyte scale I think it shouldn't really matter in the grand scheme of things.
Cost of a single engineer to manage a Minio cluster probably already outweighs the extra cost you're paying at any reasonable scale (i.e. most companies). And if you're a big player the published costs are not what you're paying.
You also need to house these servers, including backup power etc., manage them, maintain them, develop and deploy the software etc.. Also, you should really check the power usage of enterprise disks and servers; it may be cheap, but it's far from comparable to your average desktop. Then you need to add in that they need to have a reserve capacity as well; you can now go and store 100TB on S3 and AWS will be fine with it - but they need to have those disks up & running already.
Don't get me wrong, S3 is expensive, but replicating the availability, feature set and scalability is going to be very expensive, too. You can cheap out if you don't need these features, of course.
It depends on if you're comparing to raw disk, or the replicated storage, but yeah, spending 9x more on network and power than on machines is still pretty extreme.
That's not correct. Here is the history of every (properly tagged) price reduction AWS ever announced: https://aws.amazon.com/blogs/aws/category/price-reduction/