
In praise of S3 - forrestbrazeal
https://info.acloud.guru/resources/brazeal-in-praise-of-s3-the-greatest-cloud-service-of-all-time
======
ryandvm
AWS S3 is a testament to just how successful a service can be if it is SIMPLE
and RELIABLE. I recall using it in its very first incarnation in the 00's.
From an API surface perspective there was barely anything to it, but it nailed
being easy to use, easy to pay for, and as reliable as the sun.

~~~
mtberatwork
I agree that S3 is fantastic, but not sure I would consider it simple
considering how much sensitive data is left wide open in publicly accessible
buckets all the time. Clearly, applying correct permissions seems to be quite
a challenge for even experienced folks. The S3 console doesn't really make
life any easier either and terminology can be confusing. Also, applying
IAM/CORS policies, object headers, etc isn't exactly simple for the layperson.

~~~
alpha_squared
Buckets are private by default and always have been. A bucket needs to be made
public. Given that, I would hazard a guess that buckets are often made public
(when they shouldn't be) for either testing purposes and never reverted or
because creating proper access to the bucket took too much time/knowledge.

~~~
ratww
That's most certainly the reason.

I sometimes have to support freelancers working in some of our Wordpress
websites. Their first instinct when something is wrong on their end is asking
me to run a chmod or chown command they found on Google on the whole
directory. Not that it matters – we're using Docker.

Security seems to be secondary when the priority is to just deliver.

~~~
klodolph
Security _should_ be secondary. I know that sounds wrong. And that’s not an
absolute, sometimes security comes first. But in general, everyone is trying
to get stuff done and security gets in the way. My passwords get in the way of
using my devices. My keys gets in the way of coming home.

That’s different from saying “security should be an afterthought”. Security is
something you should consider consciously and prioritize against your other
goals.

Putting security first is kind of weird if you think about it. Imagine
building a house and prioritizing the locks.

~~~
xref
Several billion people don’t have easy, immediate access to your house locks.

------
anderspitman
I work in a bioinformatics lab that develops data visualizations for genetics.
A typical file for a whole human genome is in the 50-100GB range. These files
are heavily indexed and streaming-friendly, but still rather unwieldy to work
with.

Our most common flow involves files stored on S3. The app will download the
index, then use sampling and streaming techniques to pull from the full file.
This works well because S3 supports range requests.

Where S3 falls down for us is sharing. I'm not aware of any easy way to share
S3 files with email. You either have to create a signed URL (which has a max
expiration), make the file public, or use IAM, which I'm pretty sure requires
all users to have AWS accounts. That's a non-starter.

Google Drive is the opposite. It's better for sharing. Just drop in an email
address, or share a permanent link. But it no longer supports range requests,
so it's essentially walled off from the rest of the web.

What's crazy to me is that neither of these services provide a non-
authenticated way to use a CLI tool to download/sync a directory tree, even if
it's public. rclone is probably the best tool in this space, and it requires
you to configure "remotes" ahead of time. You can't just point it at a public
bucket and have it download.

I think there's still room for improvement and competition in the cloud
storage space.

~~~
sirsar
Two ideas come to mind:

\- You can create public pre-signed URLs with an unlimited expiration date
using SigV2. I doubt SigV2 will stop working anytime soon.

\- Or, if you want the link to provide access to anyone, why not make the
object public, but put a UUID in the key name to prevent enumeration?

~~~
yongjik
You have to be careful. By default, the bucket owner pays for outbound
traffic, which can get really expensive: $0.09/GB for the first 10TB. (And
there's no caching!)

------
nickcw
I know the S3 API quite well through hacking on rclone and I have to say it is
pretty good as object storage APIs go.

If I could add one thing to it, it would be a way of listing a bucket to get
the objects __and __the metadata back at once - it is another HTTP request to
get the metadata otherwise.

The large object handling is pretty good (unlike, lets say the OpenStack Swift
API which has 2 ways of uploading large objects both of which basically leave
everything hard to the caller!). Large object handling (say objects over 5GB)
is the Achilles heel of all the cloud storage systems.

The myriad ways of authenticating with S3 (which have grown like topsy over
the years) are a significant complication. However that is the same for all of
the Enterprisey cloud storage solutions (eg Azure blob storage).

There are quite a few re-implementation of the S3 API, CEPH, Wasabi, Minio,
Alibaba cloud, etc all of which are more or less 100% compatible (though the
rclone integration tests have winkled out a few small differences mostly to do
with handling of characters like LF and CR).

So impressive work AWS, making an API which has stood the test of time and
become a de-facto industry standard.

~~~
gandreani
Thanks for your work! I'm currently using rclone to backup my camera photos.
It's lovely software

Also thanks for implementing sync to different storage providers. It's
probably tempting to just implement the big players, like S3, but little ol'
people like me appreciate the rest :)

~~~
nickcw
You are welcome!

It has been a voyage implementing so many cloud providers and keeping them
working. You might be surprised how much code is tests which get run against
the cloud providers every night in the integration tests!

------
wallflower
AWS S3 is a testament to TLA+ and formal methods.

[https://lamport.azurewebsites.net/tla/formal-methods-
amazon....](https://lamport.azurewebsites.net/tla/formal-methods-amazon.pdf)

------
ktpsns
As a devloper, I have never used S3, despite, of course, heard thousands of
times of it. I'm kind of traditional: "Cloud" still means "just another
person's hard drive", and "serverless" means "just another person's computer"
for me. Having a feel for privacy, I still run a NAS at home, and it's
terrible: I really appreciate the amount of work put into S3. Somehow, as a
developer, I was able to avoid all that fancy cloud stuff. I prefer self-
hosted, but I'm bad at finding good arguments which support my point nowadays
:-/

~~~
dividuum
> "just another person's hard drive"

In general I'm with you, but that here is the main benefit imho. I don't have
the nerves to operate my own redundant, transparently growing and reliable
storage system for my business. For use cases where privacy is important (like
backups, private user data, etc), encrypting data before uploading limits the
privacy leaks to metadata which might be acceptable.

~~~
vorpalhex
The tools for doing this at "medium business" and below scale have gotten much
easier, especially with things like ZFS.

~~~
hjanssen
which is still orders of magnitude harder than _create an account and pay a
bill_

------
leafmeal
Eventual consistency is my biggest annoyance from S3.

~~~
dividuum
It seems google is better at that:
[https://cloud.google.com/storage/docs/consistency](https://cloud.google.com/storage/docs/consistency)
(when compared to
[https://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction...](https://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html#ConsistencyModel))

------
alex_young
This appears to be an advertisement.

~~~
Rafuino
It sure does seem to be one. Here's the author's sign-off at the bottom:

 _Forrest Brazeal is an AWS Serverless Hero and enterprise architect who has
led cloud adoption initiatives for companies ranging from startups to the
Fortune 50._

When I go to the AWS Heroes page, here's how a "Serverless Hero" is defined:

 _AWS Serverless Heroes are spirited pioneers of the AWS serverless ecosystem.
They evangelize AWS serverless technologies online, in person, and via open
source contributions to GitHub and the AWS Serverless Application Repository.
These Serverless Heroes help evolve the way developers, companies, and the
community at large build modern applications._

While the author of the blog doesn't work directly for AWS, it's in his
interest to drive more business for them as he consults for businesses
bringing more of their needs to AWS.

------
tehjoker
This is the kind of article you get when people are motivated by money and not
by an objective desire to compare and contrast different approaches to a
problem.

------
hn2017
One recent headache I had, I had to download JSON files loaded since
yesterday's (about 15k) using python. The issue is that the bucket had 20+
million files. The naming convention (folders) wasn't perfect unfortunately.
There wasn't a way to do this filtering on the server side. I had to download
the file list with the metadata and then make calls to get each file
individually based on metadata date.

Not a smooth process.

------
kjgkjhfkjf
Feature-wise, GCS is better. For example, it's much easier to set up a multi-
region bucket with GCS.

~~~
anderspitman
Something I've wondered about for a while: does GCS allow 3rd-party access via
OAuth? One of my biggest annoyances with S3 is you have to automate generating
signed URLs in order to allow apps to access files.

~~~
merb
it dependes. you can invite somebody to our gcloud project and give him only
access for a specific bucket, which means he can access the content via oauth.
unfortunatly it does not work in all cases i.e. some hyperlinks do not work
reliably when you create static pages with relative links, because after
authentication it rewrites the uri's.

------
dvfjsdhgfv
Greatest? The most expensive, for sure.

~~~
votepaunchy
The price has dropped from $150/TB-month at inception to $1/TB-month with Deep
Archive. Now competitive with RAID storage.

~~~
dvfjsdhgfv
I'm sorry, but there is no such thing as fixed per-TB price in S3.

~~~
byteshock
If you want an S3 service with lower pricing than AWS there are providers like
Wasabi. They only charge for space used and not traffic.

I believe Wasabi currently charges $5.99 per tb a month which is pretty
reasonable imo.

~~~
dvfjsdhgfv
That's way too expensive for my needs. I use an 10x10 TB server from Hetzner
for €200/month. Unlimited traffic is included if you use the standard 1 Gbps
link.

------
jacknews
At best, 'so far'

I don't use it, so can't comment on whether it's even the best at the moment,
let alone 'all time'.

~~~
_asummers
"All time" colloquially never implies that it will be the best forever. Being
all time at a specific point puts you in the running, but when talking about
bands, sport teams, tech, whatever, does not discount the possibility of
improvement in the future.

