Hacker News new | past | comments | ask | show | jobs | submit login
Uncomfortable AWS Truths (twitter.com)
247 points by mooreds 30 days ago | hide | past | web | favorite | 86 comments

> AWS purportedly puts design documents forward in the form of six-pagers. They start meetings with a 20 minute silent reading session. It's like the book club from hell.

Not just AWS- that's an Amazon-wide technique. And it's freaking amazing. You should try it.

Look, there's two other options. 1- let someone bullshit you with PowerPoint for an hour and skim over critical details. 2- send out the doc ahead of time for everyone to read before the meeting and have every single attendee say "ah yeah I only had time to skim it, but looks good to me". Wasted time.

The Amazon design review meeting involves no bullshitting, no homework before the meeting that gets skipped. We all arrive. We read the entire doc, on paper, red pens in hand. Then we dive into questions on the important bits and anything circled by those red pens.

Generally takes a lot less time as far as meetings go.

[Bias note: I've been at Amazon, but not AWS, for 7 years. These are my own opinions and not company statements]

Perhaps I’m just a particularly slow thinker, but 20 minutes is not enough time for me to read 6 pages of highly technical documentation and consider all of the implications and consequences of the proposed technical design. Even a full one hour meeting is insufficient. I need at least a day. I don’t want to come out of the meeting and two hours later have a “L'esprit de l'escalier” about a flaw in the design I just approved come up.

If I’m being abusively frank, if you can’t be trusted to read a 6 page document the day before a meeting and put some serious thought into it, you’re not doing your job.

> If I’m being abusively frank, if you can’t be trusted to read a 6 page document the day before a meeting and put some serious thought into it, you’re not doing your job.

I used to think this and changed my mind for a few reasons. When you have people with heavy resource contention, it’s actually difficult to set aside 20 minutes to read a document. People are busy and it’s not just jerking around wasting time. It’s also lowest common denominator so if one important decision maker wasn’t able to read it stops everyone. And I’ve found that the more important, the more busy, the less likely to read beforehand.

Also, what can the meeting organizer do when people don’t read. Saying “you’re not doing your job” over and over doesn’t really solve the problem at hand- make a decision. At least with reading time, it’s productive.

I always send out the document beforehand so one day perhaps everyone reads beforehand, one day maybe. Reading and researching before the meeting is personally helpful so it rewards people do do prep.

Typically a 6 Pager is:

1 Page of Press Release and FAQ

2 pages of arch overview

2 pages of business justification

1-2 pages of appendix

Or some mix thereof. 20 minutes is a bit short but doable. Generally at 20 minutes the person running asks "need more time" and it can go longer.

For a design review I ask for at least 2 hours for the meeting. Or more.

Doing everything in the meeting greatly respects everyone's time. It's a thing I'm trying to get evenly applied at the subsidiary I work at as often we don't as people have trouble scheduling the meeting.

See, that’s what I don’t get. A 2 hour synchronous meeting, to me, is a waste of resources. (side note: If you’re going to destroy an engineer’s morning (or afternoon) with a 2 hour meeting, you may as well make it a 4 hour meeting and jam out a POC.)

A day (or week) to review on their own availability, plus the asynchronous feedback during that day (or week) means that the synchronous meeting can conclude in 20-30 minutes (if its necessary at all), with more thought put into it than any two hour review.

In the end, the asynchronous method of syncing state between people may take more overall time, but that time is taken from when an person is naturally not engaged with their current project. A meeting, on the flip side, occurs whether they were in the zone or not.

Yes and no. In theory the asynchronous mode is more efficient, but then it might mean you never get the ACKs from all the people.

If the async mode works, no need for the meeting, after all do it over email or in GDocs.

The problem with the sync mode is that inevitably you're wasting someone's time, at best only yours as you wait for others to read.

It's probably still much more time than people would spend reading it if it were sent to them beforehand. Maybe quickly skimming it for five minutes. And not necessarily due to malice, but because most people are having a hard time setting those 20 minutes aside when you have 50 other things to do and already know that you can get away with those five minutes and some guesswork during a Powerpoint presentation.

I wish my company would adopt this... been in too many meetings without clear agendas and meetings for meetings sake

> Despite no fewer than 6 attempts to patch the Open S3 Bucket problem, it remains. You can't patch people--legally, anyway.

Heh, I use S3 for hosting static sites only.

2 weeks ago they sent an email saying "[...] your AWS account xxxxxxxx has one or more S3 buckets that allow read or write access from any user on the Internet."

And I go "oh shit, what has write access set up?".

But nothing did. When they said "read or write", they meant 1, the other, or both. They just sent the same ambiguous email to everyone.

That's so AWS it hurts.

"Surely this AWS service can't be as poorly integrated with that other AWS service as it seems, because if it were that poorly integrated, it would be almost completely useless."

"Oh. It is. FML."

I got the same email about S3 and some lambdas, with a link to fix them, that went to some generic page. You'd think Amazon could do better. Why not link me right to the bucket's configuration page? Why not list all the problem buckets in the email?

BTW, the email was sent at 3:45AM EST (my time zone), or a bit after midnight Seattle time.

Dunno if there can be a wrong time to send an email, but I usually like to send important things like resumes around 9AM so it's at the top of the morning inbox.

Shouldn't let the engineers write the emails...me included.

I had so many clients and people in my company ask me about this. The wording was horrible. It would have been better to identify things properly, or at the least, call it out clearer.

Same here. I checked everything over a few times thinking I'd missed something.

Hah, I got the same one and panicked!

Most businesses can probably run on a single server from OVH (see Paul Tyma's Mailinator architecture for inspiration: 1.2 billion emails on a single server, http://highscalability.com/mailinator-architecture), but I suspect a lot of folks in tech want to pad their resume with cloud buzzwords, so they recommend overcomplicated architectures to business that they absolutely wouldn't spend their own money on. So much of tech design, nay, tech thinking, is based on following fashion trends, it's shocking.

It's really good machine sympathy for keeping the unfunded service cheap to run, but it's one bad fan bearing away from losing a lot of messages, so I wouldn't say its design assures four nines. And the optimization work to get all the way down to n=1 wouldn't have paid off if Tyma needed to hire someone at market rate to build that and maintain it, because an engineer-month costs hundreds of instance-months.

Honestly physical servers have better uptime than some of amazon's regions.

YMMV. I've seen an adtech company who purchased hardware to install in colos, and a couple of years later their failure rate was almost 5%/year. I would never bet on four nines from a single commodity box.

What you describe is both a problem and an opportunity. We live in strange times when people pay through the nose for something that could be easily implemented on dedicated server in much simpler way and with way less risk, all for the fraction of the original amount. But tech comes in circles, and I bet we're going to see some "miraculous discoveries" of how companies saved millions by migrating from AWS to dedicated servers.

My last full time gig did this. They don't really talk about it, but it's a long established tech company in san fran. Saved alot of money doing so.

We run servers in colos in a bunch of POPs. We've tried to do the "how much would this cost in AWS" a few times now over the years, and it's hilarious how much staggeringly more expensive it is. Even if you do stupid things like assume hardware will depreciate over 2 years (in real life, try 5-10) AWS always ends up being an order of magnitude more expensive. And it's not really business-model specific: we've analyzed compute-heavy/bandwidth-heavy/HA-esque projects and it's always insane how much they get away with charging for services.

Maybe if you're in the "under $1k/mo" range it makes sense to microservice in AWS, but even then VPS hosts are so much cheaper and easier to use.

The difference between the needs of Mailinator ("Email can reside in RAM because it is temporary (3-4 hours)") and most corporate e-mail services are... large. If you treat e-mail as ephemeral, sure, you can run on a single box somewhere, and in the case of a failure, a stateless recovery will have the service up and running again.

Meanwhile, if you're a business, you have actual compliance requirements on how long you have to keep an e-mail around.[1] SarbOx says you have to keep documents related to insider dealings _indefinitely_. Do you, the sysadmin, know which e-mails those are?

So you need backups. If you aren't testing your backups, you might as well not have backups. So you need a second server sometimes to test backup restores on. Do you deal with HIPAA protected information? If so, now you have a bunch of compliance requirements about the security of those e-mails and those backups you just made.[2]

It turns out that "you can save hundreds of dollars a month on e-mail by introducing an existential risk to your business in the event of a lawsuit" is not a great value proposition for most businesses, and most businesses are better off just going with Office 365 or Google Apps for Business, which have dedicated compliance officers and certifications for all of these issues and a lot that I haven't named.[3][4]

Yes, it is possible to overarchitecture things. Yes, there are CIOs and CTOs who want to cargo cult their way to success by imitating successful cloud migrations done by others. But there are a lot of real business problems out there that can be solved by doing things differently than they were done 10 or 20 years ago.

1) https://www.intradyn.com/email-retention-laws/ 2) https://www.hipaaguide.net/hipaa-email-compliance-requiremen... 3) https://docs.microsoft.com/en-us/office365/servicedescriptio... 4) https://support.google.com/googlecloud/answer/6056694?hl=en

I think you are right. And I started out doing just that with my business. I could easily run the core of the business on a single server. But I have found that what really starts requiring scalability are all the add-ons. Then we needed caching with redis, then elastic stack for logs, then prometheus for monitoring and on and on. Also, OVH doesn't have much presence in the U.S.

> Amazon's managed ElasticSearch offering is awful because it's ElasticSearch.

Had I never used ES or ELK I wouldnt even bat an eyelash. But man... This one hurt me in the tech feels. I already dont like ES when its on premise, I cant imagine on the cloud where you have even less control of it.

I've hit basically every limit on CloudSearch when I was at a previous role. We increased the cluster count and index sizes to their 'absolute limits'. And we were still projected to blow through them by 2020.

I believe there's a project underway now to move it to ElasticSearch, but still, CloudSearch is and should always be considered a prototyping tool for proper ES implementations.

Is ES one of these necessary evils? I have been looking for a product similar to it, but it seems ES is always recommended.

No, ES is an affordable entry level search engine in a very expensive product space.

But there is nothing new in the standalone search engine space that I am aware of. Many engines have been gobbled up and integrated into larger product suites (see: Oracle Secure Enterprise Search).

The problem with ElasticSearch is that it does multiple things at once. It indexes documents. It stores documents. It searches indexes. It serves documents. The indexing and searching should be separated from the storage and serving at any kind of reasonable scale.

But, ES makes it “just work” at the proof of concept and low volume stages, and you can’t easily back out of it when you reach its limitations.

For all that people (rightfully) give them grief for being expensive, this is something Splunk got right. You can put your entire Splunk infrastructure on one host: like stem cells, a single Splunk box can perform any of the required functions (ingest, index, search). But you can also separate out each layer into separate hosts and then cluster them, so you have clusters of ingesters feeding clusters of indexers, accessed by clusters of search heads. That should be the architecture ES aims for, where people are given the flexibility to take whichever function is currently a pain point and break it out to a separate set of nodes.

Well, ES spent many years polishing itself, but there is one new very rough kid on the block:


ES is one of those technologies that "everyone" goes to when they want any kind of basic text search indexing "at scale" (usually with a delusional notion of what their scaling needs actually are), because text search/indexing is tedious and hard at any scale that a basic DB query would fall over on (which is generally a much bigger scale than many want to admit).

I saw a post about this and the other open source alternative was solr.

Not sure about support/community though.

It's funny because under the hood, they're both Lucene.

I guess that's the power of open source, right?

I don’t find ES that bad, considering its power, but I haven’t had the best experience with the AWS service. Could be user error though....

I don't fault ES for this necessarily, but the biggest problem with ES is, it's easy to build something that looks like it works, right until it doesn't.

This is a problem for most DBs. You put stuff in, you take stuff out, and it works... but you're putting it in the DB in a way that will make the queries you need nearly impossible to do within your real-world/production constraints (response time, resources required, etc.)

The thing is ES is so different from most DBs, so you need to relearn how to do things right. And to make matters a little more difficult, all DBs have a lot of knobs, but ES just has way more knobs you need to know how to turn to get something that won't fall on it's face in production.

I don't remember my experience with ES on AWS that well besides it was behind on major versions for a while at one point, and if you set up ES resource with CloudFormation... god help you if you need to roll them back.

Using serverless with ES looked something like

api-1 api-2-because-1-is-frozen-rolling-back-cloudformation-es-instances api-3-because-2-is-also-frozen-rolling-back-cloudformation-es-instances api-4-...

Have you tried elasticcloud? They solve a lot of access problems present in AWS Elasticsearch along with providing the most recent version for deployment.

Elasticcloud has terrible security features - they removed IP whitelisting.

They do have user/password authentication though, which is something that barebones ElasticSearch doesn't normally have out of the box by default.

> If you've got an old account charging you 22¢ a month, don't get mad. Start a snarky Twitter account and make sure you cost them orders of magnitude more than that in doing damage control each month instead.

I use S3 for hosting static sites only, and only in North American zones.

Some time ago, I see some billing lines stating:

"US West (Oregon) data transfer to Asia Pacific (Tokyo) 0.001 GB $0.01"

I had no idea why I'd be paying for transfer to a zone outside my own. Obviously I don't care about the 1cent, but my small problem may be someone else's big problem.

Instead of looking into it, they refunded me a month of service (a few dollars).

I guess that's the opposite of @QuinnPig's thought, but seriously, what was the charge for? Someone running their own crawler on EC2 so I paid for internal DC-DC charges?

>I guess that's the opposite of @QuinnPig's thought, but seriously, what was the charge for? Someone running their own crawler on EC2 so I paid for internal DC-DC charges?

Yes. Other customers accessing your publicly available resources pay the internal AWS fees.

Which is nice in that it saves you money, not nice in that it's not super intuitive so if you see it you think you've got some resource sitting around somewhere that shouldn't be.

You got it. It's someone else on an AWS account somewhere. Maybe they're VPNing in. Maddening, no?

I saw that too and figured it was due to Cloudfront.

Not using Cloudfront though.

' "AWS stole our open source project and turned it into a service!" is the rallying cry of people who suck at business models. '

Thanks, @QuinnyPig. I needed that laugh today.

> If you're a GM of a service at @awscloud, and you price it at a simple fixed fee of only $X per month, you can expect to be walked out that day.

As a micro-customer, I like the "Only pay for what you use" model.

But AWS charges fixed fees for Route53. You have to pay 50 cents/month/domain when hosting static sites. My volume is small enough that I pay 4x$0.50/month for Route53 DNS, and like 29 cents total for the actual storage/transfer.

The profit margin on that 50cents/month must be 99%.

Why use route 53 vs your registrar’s dns? Or use Cloudflare, it’s free

I thought I couldn't use my registrar's DNS unless I wanted it to redirect to the weird URL for the s3 bucket. But maybe I've got it all wrong and just took someone's S3 static site build instructions too literally 5 yrs ago.

no, you can set your cname to the s3 bucket's url in your registrar's dns settings...


You can also use cloudfront to do SSL if you need it.

Note: If you're really only need to host a static website and want SSL for free, Github static hosting will give you all that for free. https://pages.github.com/

There are definitely some odd things you need to do with page rules to get it working properly with https but it is possible. I'd ping CF support or Google it.

I always thought $.50 / month was expensive for that.?

Corey Quinn is my spirit animal. He calls himself a stand-up cloud comedian.

For the uninformed - myself included - Who is this guy?

Is or was he ever affiliated with AWS?

> Is or was he ever affiliated with AWS?

Does not appear to be affiliated, but launched "Last week in AWS" more than 2 years ago so may know a thing or two.




I have never worked at AWS, largely due to my personality.

I think I'd currently have trouble working for AWS as well...

Man, you are killing it with these tweets. This is genius.

Quite true and same time it is hilarious. Does anybody use aws workdocs?

Best takes (so far):

  - Nobody has figured out how to make money from AI/ML other than by selling you a pile of compute and storage for your AI/ML misadventures.
  - "AWS stole our open source project and turned it into a service!" is the rallying cry of people who suck at business models.
  - Amazon's managed ElasticSearch offering is awful because it's ElasticSearch.
  - A major reason to go public cloud that @awscloud can't say outright is "you people freaking suck at running datacenters."
  - Route 53 isn't really a database, but then again, neither is Redis.
  - MultiCloud is a good idea if you're tetched in the head; it treats cloud solely as "a place to run a bunch of VMs." If that's all you're doing, go you I guess. Bring money!
  - Reserved Instances are the best way to take the on-demand promise of the cloud, and eviscerate it completely by forcing customers to think of it like it's an ancient datacenter. "Enjoy your three year planning cycles, schmucks!"
  - Baby seals get more hits than the [AWS] forums do.
  - "You should deploy everything to be HA across multiple regions" is the rallying cry of armchair architects who don't pay their own AWS bills by a long shot.
  - "What does AWS have that GCP doesn't?" "A meaningful customer base"
  - There's only one place to see every resource in your AWS organization, in every region: the AWS bill.
  - DocumentDB isn't a perfect MongoDB clone yet, and can't be until it's just as good at trashing your production data.
  - Netflix has assembled many of the most brilliant engineers on the planet so they can... use @awscloud to stream movies. Draw your own conclusions.

My favorite, although one I have to take with a grain of salt:

"Despite all of the attention Serverless, AI/ML, etc. get on stage, the majority of AWS's income comes from EC2."

> The purest form of "static site" is the @awscloud status page

I'm convinced every companies status page is just static content hosted from an S3 bucket.

I mean, that's what it's supposed to be, except the joke is that AWS uploaded their page once in 2004 and never changed it since.

How many retweets will it take to get this for GCP?

I just retweeted it just for that

Lets see how far he gets

I wonder if the snark well will run dry. Hope not.

Corey's business partner here. I can attest: it does not. I sometimes wish it would...

Half of it's trying to get your humorless stoneface to crack a smile. We're approaching dangerous levels of snark, with limited success to date...

I optimized costs by replacing my face with a static site.

The well ran dry! Awe inspiring: it was like the immovable stone meeting the irresistable force.


I'm really here for the dunking on IBM Cloud. Five months there stole literal years from my life through the power of condensed frustration.

And it still ain't a real cloud.

> https://totes-not-amazon.com

It's like the Git Man Page Generator [0] but trained on the AWS docs. Each time you click the "ASW" home page logo, it regenerates the docs.

[0] https://git-man-page-generator.lokaltog.net/

As this thread will draw people that know 'the cloud' I'm wondering if I could get experience learning 'the cloud'?

It it as simple as taking each individual service listed in the AWS console individually and learning exactly how they work, or is there something more in-depth that matters?

Integration between the various services. Any individual service is pretty straightforward, but bringing all the pieces together can be a challenge (security, networking, etc.) Billing is also a big deal - your solution can end up being very expensive.

A better way to learn it, IMO, is to come up with some project, and implement it. For example, a typical webpage with a backend db, some storage, DNS, maybe load balancing. Avoid EC2 options, only because that's too easy (it's just a VM).

It won't be incredibly difficult, but it isn't as easy as spinning up a VM on your local machine.

Hope this helps.

Getting experience in anything is using it enough to figure out all the ways in which it sucks.

> Nobody has figured out how to make money from AI/ML other than by selling you a pile of compute and storage for your AI/ML misadventures.

Thinking more about it, have companies made much with ML other than in analytics/self driving companies/Google?

This is really funny, but I'm curious for people in the know on this: if you were starting from scratch (so no legacy), would GCP or Azure be better? As far as I can tell AWS's main advantage is cost -- is that fair?

Depends on your budget, your risks, your technology sophistication, your location, your variation / cylces of in compute & storage requirements.

Recently from the same person this story[1] on the minefield that is AWS costs for data transit.

So, whether the 'Main advantage is cost' is accurate for you is very much an 'It depends' proposition.

[1] https://news.ycombinator.com/item?id=20972687

My most uncomfortable truth about AWS is that bandwidth is crazy expensive and it does not get cheaper over time like just about everything else does.

Wow ... this is like the AWS portion of my subconscious mind spilled out all over Twitter.

>Everyone likes to make fun of outages in us-east-1 that break the internet, but Azure takes outages and everyone's websites all stay up. One wonders why.

huehuehue, doesn't the have i been pwned service use Azure? I can't recall that ever being down due to Azure outages...

We use autoscaling groups and spot instances.


Great marketing

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact