> When we moved S3 to a strong consistency model, the customer reception was stronger than any of us expected.
This feels like one of those Apple-like stories about inventing and discovering an amazing, brand new feature that delighted customers but not mentioning the motivating factor of the competing products that already had it. A more honest sentence might have been "After years of customers complaining that the other major cloud storage providers had strong consistency models, customers were relieved when we finally joined the party."
I mainly use GCP but keep hearing how great AWS is in comparison.
Imagine my surprise when porting some GCS code to S3 last year and realizing there is no way to get consistency guarantees without external lock service.
> I mainly use GCP but keep hearing how great AWS is in comparison.
Where do you keep hearing this?
Having used both, AWS is trash in comparison. It's way to complicated to do anything simple. At work I wish we could migrate to GCP (or just something that's not AWS, really).
IIRC (it has been a while) the difference is that on Amazon it can only be consistent within a region whereas on GCS I believe even multi-region buckets offer strong consistency.
You get this same pattern with a lot of stories about software. Features are often implemented in a way that’s simple for the developers, but not really a great fit for what’s actually needed. Then typically some story is given to justify why the resulting limitations or usability issues are actually a good thing.
S3 is up there as one of my favorite tech products ever. Over the years I've used it for all sorts of things but most recently I've been using it to roll my own DB backup system.
One of the things that shocks me about the system is the level of object durability. A few years ago I was taking an AWS certification course and learned that their durability number means that one can expect to loose data about once every 10,000 years. Since then anytime I talk about S3's durability I bring up that example and it always seems to convey the point for a layperson.
And it's "simplicity" is truly elegant. When I first started using S3 I thought of it as a dumb storage location but as I learned more I realized that it had some wild features that they all managed to hide properly so when you start it looks like a simple system but you can gradually get deeper and deeper until your S3 bucket is doing some pretty sophisticated stuff.
Last thing I'll say is, you know your API is good when "S3 compatable API" is a selling point of your competitors.
> Last thing I'll say is, you know your API is good when "S3 compatable API" is a selling point of your competitors.
Counter-point: You know that you're the dominant player. See: .psd, .pdf, .xslx. Not particularly good file types, yet widely supported by competitor products.
Photoshop, PDF and Excel are all products that were truly much better than their competitors at the time of their introduction.
Every file format accumulates cruft over thirty years, especially when you have hundreds of millions of users and you have to expand the product for use cases the original developers never imagined. But that doesn’t mean the success wasn’t justified.
Most people use libraries to read and write the files, and judge them pretty much entirely by popularity.
A very popular file format pretty much defines the semantics and feature set for that category in everyone's mind, and if you build around those features, then you can probably expect good compatibility.
Nobody thinks about the actual on disk data layout, they think about standardization and semantics.
I rather like PDF, although it doesn't seem to be well suited for 500MB scans of old books and the like, they really seem to bog down on older mobile devices.
It’s designed for that level of durability, but it’s only as good as a single change or correlated set of hardware failures that can quickly change the theoretical durability model. Or even corrupting data is possible too.
You're totally correct, but these products also need to be specifically designed against these failure cases (i.e. it's more than just MTTR + MTTF == durability). You (of course) can't just run deployments without validating that the durability property is satisfied throughout the change.
Yep! There’s a lot of checksum verification, carefully orchestrated deployments, hardware diversity, erasure code selection, the list goes on and on. I help run a multi-exabyte storage system - I’ve seen a few things.
This is true. While I prefer non-SaaS solutions generally, S3 is something that’s hard to cost effectively replace. I can setup an AWS account, create an S3 bucket, and have a system that can then persist at least one copy of my data to at least two data centers each within a goal of 1 second. And then layer cross-region replication if I need.
It’s by no means impossible to do that yourself, but it costs a lot more in time and upfront expense.
I've used it for server backups too, just a simple webserver. Built a script that takes the webserver files, config files and makes a database dump, packages it all into a .tar.gz file on monday mornings, and uploads it to S3 using a "write only into this bucket" access key. In S3 I had it set up so it sends me an email whenever a new file was added, and that anything older than 3 weeks is put into cold storage.
Of course, I lost that script when the server crashed, the one thing I didn't back up properly.
> but as I learned more I realized that it had some wild features that they all managed to hide properly so when you start it looks like a simple system but you can gradually get deeper and deeper until your S3 bucket is doing some pretty sophisticated stuff.
Over the years working on countless projects I’ve come to realize that the more “easy” something looks to an end user, the more work it took to make it that way. It takes a lot of work to create and polish something to the state where you’d call it graceful, elegant and beautiful.
There are exceptions for sure, but often times hidden under every delightful interface is an iceberg of complexity. When something “just works” you know there was a hell of a lot of effort that went into making it so.
I did a GCP training a while back, and the anecdote from one of the trainers was that the Cloud Storage team (GCP’s S3-compatible product) hadn’t lost a single byte of data since GCS had existed as a product. Crazy at that scale.
What it means is in any given year, you have a 1 in 10,000 chance that a data loss event occurs. It doesn’t stack like that.
If you had light bulbs that lasted 1,000 hrs on average, and you had 10k light bulbs, and turned them all on at once, then they would all last 1,000 hours on average. Some would die earlier and some later, but the top line number does not tell you anything about the distribution, only the average (mean). That’s what MTTF is; the mean time for a given part to where it has a greater likelihood to have failed by then vs not. It doesn’t tell you if the distribution of light bulbs burning out is 10 hrs or 500 hrs wide. it’s the latter, you’ll start seeing bulbs out within 750 hrs, but if the former it’d be 995 hrs before anything burned out.
Object integrity isn’t part of the S3 SLA. I assume that is mostly because object integrity is something AWS can’t know about per se.
You could unknowingly upload a corrupted file, for example. By the time you discover that, there may not be a clear record of operations on that object. (Yes, you can record S3 data plane events but that’s not the point.)
Only the customer would know if their data is intact, and only the customer can ensure that.
The best S3 (or any storage system) can do is say “this is exactly what was uploaded”.
And you can overwrite files in S3 with the appropriate privileges. S3 will do what you ask if you have the proper credentials.
Otherwise, S3 is designed to be self-healing with erasure encoding and storing copies in at least two data centers per region.
Yes but my point stands. If AWS added S3 data integrity to the SLA then it’s now made that commitment contractually. If you add checksum data the checksums would (logically) be required and also be in scope of the SLA. If there was a mismatch between them and the file functioned it would be impossible to sanely adjudicate who is responsible for the discrepancy, or what the nature of that discrepancy might be if no other copies of the file exist.
AWS probably doesn’t want those risks and ambiguities.
I generally see object storage systems advertise 11 9s of availability. You would usually see a commercial distributed file system (obviously stuff like Ceph and Lustre will depend on your specific configuration) advertise less (to trade off performance for durability).
In general if you actually do the erasure coding math, almost all distributed storage systems that use erasure coding will have waaaaay more than 11 9s of theoretical durability
S3's original implementation might have only had 11 9s, and it just doesn't make sense to keep updating this number, beyond a certain point it's just meaningless
Like "we have 20 nines" "oh yeah, well we have 30 nines!"
To give an example of why this is the case, if you go from a 10:20 sharding scheme to a 20:40 sharding scheme, your storage overhead is roughly the same (2x), but you have doubled the number of nines
So it's quite easy to get a ton of theoretical 9s with erasure coding
it's really not that impressive, but you have to use erasure coding (chop the data D in X parts, use these to generate Y extra pieces, and store all X+Y of them) iso replication (store D n times)
Why not? I don't work with web-apps or otherwise use object stores very often, but naively I would expect that "my objects not disappearing" would be a good thing.
I think their point is that you'd need even higher durability. With millions of objects, even 5+ nines means that you lose objects relatively constantly.
> I’ve seen other examples where customers guess at new APIs they hope that S3 will launch, and have scripts that run in the background probing them for years! When we launch new features that introduce new REST verbs, we typically have a dashboard to report the call frequency of requests to it, and it’s often the case that the team is surprised that the dashboard starts posting traffic as soon as it’s up, even before the feature launches, and they discover that it’s exactly these customer probes, guessing at a new feature.
This surprises me; has anyone done something similar and benefitted from it? It's the sort of thing where I feel like you'd maybe get a result 1% of the time if that, and then only years later when everyone has moved on from the problem they were facing at the time...
It's funny—S3 started as a "simple" storage service, and now it's handling entire table abstractions. Reminds me how SQL was declared dead every few years, yet here we are, building ever more complex data solutions on top of supposedly simple foundations.
I instinctively distrust any software or protocol that implies it is "simple" in its name: SNMP, SMTP, TFTP, SQS, etc. They're usually the cause of an equal or more amount of headaches than alternatives.
Maybe such solutions are a reaction to previous more "complex" solutions, and they do indeed start simple, but inevitably get swallowed by the complexity monster with age.
TFTP is probably the exception to that rule. All the other protocols started out easily enough and added more and more cruft. TFTP stayed the way it's always been - minimalist, terrifyingly awful at most things, handy for a few corner cases. If you know when to use it and when to use something like SCP, you're golden.
If TFTP had gone the way of SNMP, we'd have 'tftp <src> <dest> --proto tcp --tls --retries 8 --log-type json' or some horrendous mess like that.
TFTP's usefulness in the modern day is strictly for things that don't have a TCP stack. Anything with a TCP stack is better off with HTTP. That doesn't leave much on the table except legacy & inertia.
The true hero in AWS is its authentication and accounting infrastructure.
Most people don't even think about it. But authenticating trillions of operations per second is why AWS works. And the accounting and billing. Anyone that does authentication knows how hard it is. At AWS' scale it's well, the pinnacle of distributed systems.
little history: When we were getting ready to do an API at DigitalOcean I got asked "uhm... how should it feel?" I thought about that for about 11 seconds and said "if all our APIs feel as good as S3, it should be fine" - it's a good API.
It's only sad that the SDKs are often on the heavy side, I remember that the npm package used to be multiple megabytes as it bundled large parts of the core AWS SDK. Nowadays, I believe it's still around half a megabyte.
The footprint of the JS SDK is much better as they split all of it into service-specific packages, but the SDK APIs are a bit confusing (everything is a class that has to be instantiated—even config).
Now add versioning, replication, logging, encryption, ACLs, CDN integration, event triggering (Lambda). I could go on. These are just some other features I can name off the top of my head. And it all has to basically run with zero downtime, 24x7...
Anyone can create a CRUD API. It takes a _lot_ of work to make a CRUD API that scales with high availability and a reasonable consistency model. The vast majority of engineers would take months or years to write demo.
If you don't believe me, you might want to reconsider how skilled the average developer _really_ is.
I used to have the same opinion until I built my own CDN. Scaling something like that is no joke, let alone ensuring you handle consistency and caching properly.
A basic implementation is simple, but at S3 scale, that's a whole different ball game.
That’s when you really know you hid all the complexity well. When people call your globally replicated data store with granular permissions, sophisticated data retention policies, versioning, and manage to have, what, seven (ten?) nines or something, “simple”.
No problem. I’m sure ChatGPT could cook up a replacement in a weekend. Like Dropbox it’s just rsync with some scripts that glue it together. How hard could it possibly be?
I mean people serve entire websites right out of s3 buckets. Using it as a crude CDN of sorts.
The point of the article is that making a service at S3's scale with such a simple API exposed to the end user, without exposing the provisioning and durability and security and performance to make it all work...
I have a feeling that economies of scale have a point of diminishing returns. At what point does it become more costly and complicated to store your data on S3 versus just maintaining a server with RAID disks somewhere?
S3 is an engineering marvel, but it's an insanely complicated backend architecture just to store some files.
That's going to depend a lot on what your needs are, particularly in terms of redundancy and durability. S3 takes care of a lot of that for you.
One server with a RAID array can survive, usually, 1 or maybe 2 drive failures. The remaining drives in the array will have to do more work when a failed drive is replaced and data is copied to the new array member. This sometimes leads to additional failures before replacement completes, because all the drives in the array are probably all the same model bought at the same time and thus have similar manufacturing quality and materials. This is part of why it's generally said that RAID != backup.
You can make a local backup to something like another server with its own storage, external drives, or tape storage. Capacity, recovery time, and cost varies a lot across the available options here. Now you're protected against the original server failing, but you're not protected against location-based impacts - power/network outages, weather damage/flooding, fire, etc.
You can make a remote backup. That can be in a location you own / control, or you can pay someone else to use their storage.
Each layer of redundancy adds cost and complexity.
AWS says they can guarantee 99.999999999% durability and 99.99% availability. You can absolutely design your own system that meets those thresholds, but that is far beyond what one server with a RAID array can do.
How many businesses or applications really need 99.999999999% durability and 99.99% availability?
Is your whole stack organized to deliver the forementioned durability and availability?
I think that this is, to Andy's point, basically about simplicity. It's not that your business necessarily needs 11 9s of durability for continuity purposes, but it sure is nice that you never have to think about the durability of the storage layer (vs. even something like EBS where 5 9s of durability isn't quite enough to go from "improbable" to "impossible").
There are a lot of companies who their livelihood depends on their proprietary data, and loss of that data would be a company-ending-event. I'm not sure how the calculus works out exactly, but having additional backups and types of backups to reduce risk is probably one of the smaller business expenses one can pick up. Sending a couple TB of data to three+ cloud providers on top of your physical backups is in the tens of dollars per month.
Different people and organizations will have different needs, as indicated in the first sentence of my post. For some use cases one server is totally fine, but it's good to think through your use cases and understand how loss of availability or loss of data would impact you, and how much you're willing to pay to avoid that.
I'll note that data durability is a bit of a different concern than service availability. A service being down for some amount of time sucks, but it'll probably come back up at some point and life moves on. If data is lost completely, it's just gone. It's going to have to be re-created from other sources, generated fresh, or accepted as irreplaceable and lost forever.
Some use cases can tolerate losing some or all of the data. Many can't, so data durability tends to be a concern for non-trivial use cases.
Probably never. The complexity is borne by Amazon. Even before any of the development begins if you want a RAID setup with some sort of decent availability you've already multiplied your server costs by the number of replicas you'd need. It's a Sisyphean task that also has little value for most people.
Much like twitter it's conceptually simple but it's a hard problem to solve at any scale beyond a toy.
One interesting thing about S3 is the vast scale of it. E.g. if you need to store 3 PB of data you might need 150 HDDs + redundancy, but if you store it on S3 it's chopped up and put on tens of thousands of HDDs, which helps with IOPS and throughput. Of course that's shared with others, which is why smart placement is key, so that hot objects are spread out.
There is a diminishing return of what percentage you save, sure. But amazon will always be at that edge. They already have amortized the equipment, labour, administration, electricity, storage, cooling, etc.
They also already have support for storage tiering, replication, encryption, ACLs, integration with other services (from web access to sending notifications of storage events to lambda, sqs, etc). Uou get all of this whether you're saving 1 eight bit file or trillions of gigabyte sized ones.
There are reasons why you may need to roll your own storage setup (regulatory, geographic, some other unique reason), but you'll never be more economical than S3, especially if the storage is mostly sitting idle.
> At what point does it become more costly and complicated to store your data on S3 versus just maintaining a server with RAID disks somewhere?
It's more costly immediately. S3 storage prices are above what you would pay even for triply redundant media and you have to pay for data transfer at a very high rate to both send and receive data to the public internet.
It's far less complicated though. You just create a bucket and you're off to the races. Since the S3 API endpoints are all public there's not even a delay for spinning up the infrastructure.
Where S3 shines for me is two things. Automatic lifecycle management. Objects can be moved in between storage classes based on the age of the object and even automatically deleted after expiration. The second is S3 events which are also _durable_ and make S3 into an actual appliance instead of just a convenient key/value store.
The per GB price on S3 is higher than on bulk HDDs. This is easily observed. What you are saying is your data storage needs don't even justify a single HDD. This is a scaling issue and not a pricing issue.
Oh, so “it’s more costly immediately” actually meant “it’s more costly once you’re storing over some threshold of data.” Ok. I can get behind that.
I don’t think it’s so easily observed at scale though, because at that point it’s hardly just the hdd cost anymore. It’s the hdd, server/compute, cabling, cooling, power, facilities, security, cabling, maintenance.
The TCO of data storage isn't just the drive - it just so happens that it’s still less than the cost of a drive up to some threshold. I don’t know if anyone having done a full cost model comparison. Everything I’ve ever seen assumes the data center is free.
Since everything you need to run "a full SPA" is to serve some static files over an internet connection I'm not sure how that tells you anything interesting about the platform. It's basically the simplest thing a web server can do.
We've used Netlify at previous projects, we used it because it was easy. No AWS accounts or knowledge needed, just push to master, let the CI build (it was a Gatsby site) and it was live.
I think Netlify is great but to me it's overkill if you just have a static site.
I understand that Netlify is much simpler to get started with and setting up an AWS account is somewhat more complex. If you have several sites, it's worth spending the time to learn.
"I think one thing that we’ve learned from the work on Tables is that it’s these _properties of storage_ that really define S3 much more than the object API itself."
Between the above philosophy, S3 Tables, and Express One Zone (SSD perf), it makes me really curious about what other storage modalities S3 moves towards supporting going forward.
It's great that they added iceberg support I guess, but it's a shame that they also removed S3 Select. S3 Select wasn't perfect. For instance the performance was no where near as good as using DuckDB to scan a parquet file, since duck is smart, and S3 Select does a full table scan.
But S3 Select is nearly way cheaper that the new iceberg support. So if your needs are only for reading one parquet snapshot, we no need to do updates, then this change is not welcome.
Great article though, and I was pleased to see this at the end:
> We’ve invested in a collaboration with DuckDB to accelerate Iceberg support in Duck,
For those interested in S3 Tables which is referenced in this blog post, we literally just published this overview on what they are and cost considerations of them that people might find interesting: https://www.vantage.sh/blog/amazon-s3-tables
Lakehouse is an architecture defined to overcome the limitations associated with an immutable object store. It is already in my eyes introducing unnecessary complexity (i.e. at which point do I just need to transition to a proper database, even for larger scales (that cannot be accommodated by a database?), when is a tiered data architecture with stream materialisation snapshots actually simpler to reason about and more economic etc.)
I would hope that S3 could introduce a change in the operation of said fundamental building block (immutable object), rather than just slap an existing downstream abstraction. That's not what I call design for simplicity. As an external observer, I would think that's internal amazon moat management with some co-branding strategy.
Lots of comments here talking about how great S3 is.
Anyone willing to give a cliff notes about what's good about it?
I've been running various sites and apps for a decade, but have never touched S3 because the bandwidth costs are 1, sometimes 2 orders of magnitude more expensive than other static hosting solutions.
One underappreciated feature of S3 - that allowed it to excel in workloads like the Tables feature described in the article - is that it's able to function as the world's highest throughout network filesystem. And you don't have to do anything to configure it (as the article points out). By storing data on S3, you get to access the full cross-sectional bandwidth of EC2, which is colossal. For effectively all workloads, you will max out your network connection before S3's. This enables workloads that can't scale anywhere else. Things like data pipelines generating unplanned hundred-terabit-per-second traffic spikes with hotspots that would crash any filesystem cluster I've ever seen. And you don't have to pay a lot for it- once you're done using the bandwidth, you can archive the data elsewhere or delete it.
You've totally hit the nail on the head. This is the real moat of S3, the fact that they have so much front-end throughput available from the gigantic buildout that folks can take advantage of without any capacity pre-planning.
There are a few things about S3 that I find extremely powerful.
The biggest is, if I need to store some data. I know what the data is (so I don't need to worry about needing to in a moment notice traverse a file structure for example, I know my filenames), I can store that data, I don't need to figure out how much space I need ahead of time and it is there when I need it. Maybe it automatically moves to another storage tier to save me some money but I can reliably assume it will be there when I need it. Just that simplicity alone is worth a lot, I never need to think later that I need to expand some space, possibly introducing downtime depending on the setup, maybe dealing with partitions, etc.
Related to that is static hosting. I have ran a CDN and other static content out of S3 with cloud front in front of it. The storage cost was almost non existent due to how little actual data we were talking about and only paid for cloudfront costs when there were requests. If nothing was being used it was almost "free". Even when being used it was very cheap for my use cases.
Creating daily inventory reports in S3 is awesome.
But the thing that really is almost "magic" once you understand its quirks. Athena (and quick sight built on top of that and similar tools). The ability to store data in S3 like inventory reports that I already mentioned, access logs, cloud watch logs, or any structured data that you may not need to query often enough to warrant a full long running database. It may cost you a few dollars to run your Athena query and it is not going to be super quick, but if you know what you're looking for it is amazing.
S3 has often fallen into a "catch all" solution for me whenever I need to store data large enough that I don't want to keep it in a database (RDBMS or Redis).
Need to save a file somewhere? Dump it in S3. It's generally affordable (obviously dependent on scale and use), fast, easy, and super configurable.
Being able to expose something to the outside, or with a presigned URL is a huge advantage as well.
Off the top of my head, I think of application storage generally in this tier ordering (just off the top of my head based on the past few years of software development, no real deep thought here):
1. General application data that needs to be read, written, and related - RDBMS
2. Application data that needs to be read and written fast, no relations - Redis
3. Application data that is mostly stored and read - S3
Replace any of those with an equivalent storage layer.
Do you need to replace your SFTP server? S3. Do you need to backup TB of db files? S3. Do you need a high performance web cache? S3. Host your SPA? S3 backs cloudfront. Shared filesystem between desktop computers? Probably a bad idea but you can do it with S3. Need a way for customers to securely drop files somewhere? Signed S3 URI. Need to store metrics? Logs? S3. Load balancer logs? S3. And it's cheaper than an EBS volume, and doesn't need resizing every couple of quarters. And there are various SLAs which make it cheaper (Glacier) or more expensive (High Performance). S3 makes a great storage backend for a lot of use cases especially when your data is coming in from multiple regions across the globe. There are some quibbles about eventual consistency but in general it is an easy backend to build for.
The bandwidth is free within AWS, the ingress is also free
I've used it lately in the following setup
- stuff on internet push their data into my bucket => this is mostly free (you only pay s3 operations)
- on object creation, an event is fired and a lambda is spawned => this is free
- the lambda code reads the object => this is mostly free (again, you only pay s3 operations)
- the lambda process the data, trim it, repackage it, compress it => you pay for the compute
- the lambda store the resulting data somewhere else and delete the s3 object => you pay the egress
S3 is great for being able to stick files somewhere and not have to think about any of the surrounding infrastructure on an ongoing basis [1]. You don't have to worry about keeping a RAID server, swapping out disks when one fails, etc.
For static hosting, it's fine, but as you say, it's not necessarily the cheapest, though you can bring the cost down by sticking a CDN (Cloudflare/CloudFront) in front of it. There are other use cases where it really shines though.
[1]: I say ongoing basis because you will need to figure out your security controls, etc. at the beginning so it's not totally no-thought.
I really enjoy using S3 to serve arbitrary blobs. It perfectly solves the problem space for my use cases.
I avoid getting tangled in authentication mess by simply naming my files using type4 GUIDs and dumping them in public buckets. The file name is effectively the authentication token and expiration policies are used to deal with the edges.
This has been useful for problems like emailing customers gigantic reports and transferring build artifacts between systems. Having a stable URL that "just works" everywhere easily pays for the S3 bill in terms of time & frustration saved.
My favorite use case for S3 API-compatible solutions: I often run into systems that generate lots of arbitrary data that only have temporary importance. A common example might be intermediate build artifacts, or testing ephemera (browser screenshots, etc). Things that are needed for X number of months and then just need to disappear.
Yeah, we can dump those to a filesystem. But then we have to ask which filesystem? What should the directory layout be? If there are millions or billions of objects, walking the whole tree gets expensive. Do we write a script to clean everything up? Run it via cron or some other job runner?
With S3, you just write your artifact to S3 with a TTL and it gets deleted automagically when it should. No cron jobs, no walking the whole tree. And you can set up other lifecycle options if you need it moved to other (cheaper) storage later on, backups, versioning, and whatever else.
For on-prem, you have Minio, Garage, or SeaweedFS. These are pretty nice to deploy the servers however you need for the level of reliability/durability you require.
A good security model doesn't let stupid people do stupid things with warnings. Stupid people ignore warnings.
A good model protects stupid people from themselves.
A password model that says "you MUST have a strong password" protects stupid people from their own stupidity. A model that says you can use bad passwords if you click through warnings is a shite model. Stupid people ignore warnings.
AWS typically considers the "GA" milestone as the "public launch" date, which is silly because the criteria for what is good enough for GA has changed over the years.
S3 was one of the first offerings coming out of AWS right? It’s pretty legendary and a great concept to begin with. You can tell by how much sense it makes and then trying to wrap your ahead around the web dev world pre-S3.
? The web dev world pre-S3 was pretty much the same, but you stored your files on a regular server (and set up your own redundancy and backup strategy). Not that much different to be honest from an end user's point of view.
At a lot of places there wasn't even a redundancy nor backup strategy, so it really was just as simple as registering with a hosting company and ssh+ftp (or cPanel or something like that for what amounted to the managed solutions of the time).
I agree, things before S3 weren't really that different. LAMP stacks everywhere, and developer skills were very portable between different deployments of these LAMP stacks. Single machines didn't scale as much then, but for most small-medium sites they really didn't need to.
If you read the article, they say that exactly. Its by Dr Werner Vogels, they know exactly what goes into S3, since they are a principal engineer on the project.
if only metadata could be queried without processing a csv output file first, imagine storing thumbnails in there even! copied objects had actual events, not something you have to dig cloudtrail for, you could get last update time from a bucket to make caching easier
whoops. I was wrong! you can store base64 encoded metadata in the object then run a HEAD request to get it, but its limited to 2kb.
also, you ~can~ query the metadata, but its latency is more suited to batch processing and not lambdas
Table stakes is a poker term meaning the absolute minimum amount you are allowed to bet. So the title translates to "In S3, simplicity is the bare minimum" or "In S3, simplicity is so important that if we didn't have it, we might as well not even have S3."
It's a risky idiom in general because it's often used to prevent debate. "Every existing product has feature X, so feature X is table stakes." "Why are we testing whether we really need this feature, it's table stakes!" My observation has been that "table stakes" features are often the best ones to reject. (Not so in the case of this title, though)
Clicked on the article thinking it was about S3 Graphics, the company that made the graphics chip in my first PC. Now I see it's some amazon cloud storage thing.
This feels like one of those Apple-like stories about inventing and discovering an amazing, brand new feature that delighted customers but not mentioning the motivating factor of the competing products that already had it. A more honest sentence might have been "After years of customers complaining that the other major cloud storage providers had strong consistency models, customers were relieved when we finally joined the party."