
AWS status updates not working due to S3 - joshua_wold
https://twitter.com/awscloud/status/836656664635846656
======
simplehuman
It baffles me that AWS, a leader in cloud computing can make such a
rudimentary mistake. Seriously, I interviewed there and they asked me to write
a b+ tree and I failed. And then you see fundamental errors like this which
possibly cannot be made by people who had the smarts to write b+ trees in 15
minutes...

I want to take this opportunity to complain about the interview system. Hire
people who care about the product and company. Such mistakes cannot be made by
people who care.

~~~
wheaties
Writing a B+ tree from memory and making sure your infrastructure isn't doing
something stupid are fundamentally different skills. One requires that you
regurgitate the contents of a text book on a white board, the other that you
can engineer a solution. I wish them well on an interview set up for hiring
the former; I try to hire the later.

~~~
simplehuman
> Writing a B+ tree from memory and making sure your infrastructure isn't
> doing something stupid are fundamentally different skills.

It's funny. You know it. I know it. Entire HN knows it. And yet _no_ interview
follows any such common sense rules. Just go to a Google/FB interview and they
ask you all sort of questions. It doesn't matter what you are interviewing
for. In fact, in many cases they don't even tell you which group/team/project
you will be assigned to. Since they will "assess" where you fit best.

~~~
rifung
> Since they will "assess" where you fit best.

Disclaimer: I work for Google

At Google this just isn't true unless we are talking about new grads. After
you pass the interviews, you have to do team matching, where you will have
informal 2 way interviews with prospective teams. Only after you find a team
that you like and that likes you can you get an offer, assuming your
application is approved.

It's definitely not a case of sticking you in a team without your input. It's
true that at the interview stage you won't know, but by the time you have an
offer you know what team you'll be on.

I believe at Facebook you have even more freedom. You get to go to boot camp
for 3 months and after that get to choose what team you want to be on. I
haven't worked there so this is just based off what recruiters and friends
have told me though so hopefully someone else can correct me or elaborate.

~~~
simplehuman
Thanks for the reply. I am talking about the interviewing stage. I have more
than 10 years of engineering experience but the Google HR actually sent me a
PDF of all the topics I should be well versed it. The booklet was basically my
entire grad course and masters and more. I can confirm from more than 5
sources that this is the case with Google interviews.

~~~
jaredsohn
You stated two things: the fact that Google asks algorithms questions and
"Since they will "assess" where you fit best". The GP was refuting the second
of these things (the GP even quoted it) while you were defending the first
statement.

BTW, I think that second point is a common misconception that deserves
rebuttal because I think that until a five years ago or so Google wouldn't
tell you which team you would join (or give you much choice) before deciding
on an offer, even if experienced.

------
joshuak
So now you know that a deadman switch is the better way to report
availability. The logic was backwards for this signal. The default condition
is failed. Not failed requires proof.

It's interesting how easy it is to accidentally invert logical operations. I
see it in code all the time. A condition will test that A is true when what
they really need to know is if B and C are both false. It's like some kind of
cognitive tick.

~~~
ryanbrunner
That's good practice, sure, but their problem was even more fundamental than
that. Their status page was dependent on the service it was reporting on being
up. That fails the most basic requirement of a status page.

------
Cafey
This should be the official anti-pattern when designing a status page.

~~~
qeternity
It is...it's literally the reason products like statuspage.io exist, because
if your status page has any dependencies on the services for which is provides
statuses, then it's not really a useful status page.

~~~
simplehuman
And yet, I cannot find any obvious information on where statuspage is hosted.

~~~
asp2insp
Builtwith seems to think they are hosted on EC2
[https://builtwith.com/statuspage.io](https://builtwith.com/statuspage.io)

~~~
manojlds
Their status page says so anyway -
[http://metastatuspage.com/incidents/lb3rpt031vmx](http://metastatuspage.com/incidents/lb3rpt031vmx)

------
fred256
Looks like they've fixed it now. (The status page, not s3)

~~~
joshua_wold
yup -
[https://twitter.com/awscloud/status/836662601090134017](https://twitter.com/awscloud/status/836662601090134017)

------
BrailleHunting
This is a problem of "monoculture" dependencies and failure to implement HA by
using multiple services. All Github releases are down, atom downloads are down
and so on. Companies, including Amazon, should be using other CDNs for HA
purposes, even if NIH.

It's a similar mistake of making DNS a dependency for monitoring/control
infrastructure when DNS is down.

~~~
ghaff
Assuming that it actually makes business sense to do so. There are certainly
cases where you can make a perfectly rational business decision to depend on
someone else's services and you're OK with your uptime not being any better
than their uptime.

------
alpb
> The dashboard not changing color is related to S3 issue.

I don't understand this. The icon URL is in the HTML. Both icons
[https://status.aws.amazon.com/images/status0.gif](https://status.aws.amazon.com/images/status0.gif)
and
[https://status.aws.amazon.com/images/status3.gif](https://status.aws.amazon.com/images/status3.gif)
have been working for us all along. Plus clearly they are able to update the
status page contents, because they added the "increased error rates" message
there too. I don't want to believe it but is it fair to assume they did not
want to replace status0.gif with status3.gif in HTML? Please correct me if I'm
not getting this straight.

In any case, it's a bad day for AWS folks, I'm feeling their pain too. Being a
cloud provider is a tough business to be at and the pressure is really high.

~~~
ryanbrunner
One explanation might be that they use an internal tool to update the status
page definitions, and parts of that tool are hosted on S3. Or that the status
definitions themselves are hosted on S3 (and then read and transformed into
the HTML page everyone sees)

------
brational
Drone crashes into my living room with groceries. Receive email that my
package was successfully delivered.

~~~
throwaway29292
I would hate to imagine drones dropping out of the sky if S3 went down in the
future.

------
cwmma
So the obvious answer would be to host it on like azure or google cloud
storage but I can just imagine the institutional push back that would get
trying to do that.

~~~
idlewords
What if I told you you could make a red dot without hosting an image anywhere?

~~~
doubleplusgood
Red dot as a service?

------
ohstopitu
Just to be clear...best practices with designing status pages:

1\. ensure it does not depend on your infra (if your api server goes down - it
should not take down your status api with it)

2\. make sure your service reports to your status page instead of your status
page looking for the service.

3\. redundancy for your status page?

anything anyone-else wants to add?

~~~
janywer
Not sure about 2 - What are your arguments for this?

If the status page relies on getting updated information from the service, it
may not even notice when the whole thing just crashes and goes down in flames.
Attempting to do some predefined calls to the service to evaluate whether it
is working correctly appears like a better solution?

~~~
djsumdog
Yea I was wondering about that comment too. I mean you can do both. Your
status page should be static, updated by a service which both polls and
accepts information from your services. You ideally want to go yellow if one
of the two fails.

But yes, in general, the status page and status services should be entirely on
their own independent infrastructure; and in a different data centre. A number
of providers offer independent status page services. If your entire company
runs off Digital Ocean, your status page/services should probably be running
on Linode or AWS or whatever.

------
paulpauper
ironic .the status update doomed by its own downtime

