

Reddit has now been down for 30+ hours - scorpion032
http://www.reddit.com/?down
For the first time ever the admins have pushed a hand curated content (where did they get that one from?) to the top.<p>Seems to be like the community is well past the state of being angry when the site gets back up. They will be so glad that it is back at all. Need more donations Conde Nast?
======
chuhnk
Being a sys admin and having to plan disaster recovery I can understand this
scenario. I've played it out in my mind, its been done to death on paper and
we've tested it. My question is, why aren't reddit doing anything about it?
After numerous outages over the past few months it seems reddit's answer to
everything is, "It's not our fault, it's Amazon". In all honesty without
turning this into an attack that is ridiculous. If my company were to suffer
outages on a weekly basis, my boss and I would both be under the gun and
seriously before it even got to that point we would be restructuring systems
and shifting hosting platforms. Before anyone says to me, "hold on, you cant
just go moving infrastructure". BS. When your business is the website and you
are providing a service, availability is damn well key. I've moved datacenters
more than once, I've delt with disaster recovery, I know how to do these
things transparently and before you start thinking I'm arrogant well actually
I only have 4 years of experience in the system administration game so I'd
expect people with more talent than me to be able to get the job done pretty
damn effortlessly. If our company suffered an outage of more than 3-4 hours
we'd be switching to our DR plan end of story. That means switching dns and
pushing to warm standby servers.

Engineers of reddit, take some initiative. Move off aws. I dont care how hard
it is, look at the outages you've suffered. Look at the problems you've had.
Havent you figured it out yet? Get some physical hardware with rackspace and
sort it out.

Please understand this comment is not meant to attack anyone but I am
frustrated with people shifting blame and not taking responsibility. If your
down 30 hours then you have failed.

~~~
wdewind
Reddit has < 5 engineers working on it and in the last few months has been at
1 active engineer for a while. With 13mm unique visitors and > 1 billion page
views monthly let's cut them some slack.

~~~
chuhnk
Let me ask you. How does a company get into that state? Say you had 5
engineers, 1 billion page views a month and ongoing issues. Wouldn't you
dedicate some resources to that? When one engineer leaves, wouldn't you worry
a little? And then another, and another? Would you sit and do nothing? These
events did not occur in a day. They occurred over a lengthy period of time. I
can sympathize with the lone engineer because I've been there. I still sleep
with my laptop and phone next to my head. I know what the burn out feels like,
so yes I feel for that engineer. But as a business letting it get to that
point.

edit: JonnieCache mentioned Conde Naste will not let them hire more people.
Obviously we know where the fault lies then.

~~~
JonnieCache
_> JonnieCache mentioned Conde Naste will not let them hire more people._

Yeah, I imagine the situation is more complicated than that, but look into
some of the reddit blogposts and some of the admins comments on here for more
details. There was a lot of stuff in the big 'AWS is down' thread about it,
something about CN giving them an unlimited operating budget but a tiny, non
negotiable hiring budget. God knows.

~~~
groovylick
Gist I got from the 'AWS is down' thread was they had maxed their operation
budget and that was why they couldn't afford to use multiple regions. Reddit
being very write heavy the data synchronization across regions would be too
expensive in both bandwidth and manpower.

------
zck
I was wondering why I was logged in, able to comment and vote.

They're letting back in members slowly, with Reddit Gold members having a
larger chance of being allowed in:
[http://www.reddit.com/r/reddit.com/comments/gv63r/nobody_can...](http://www.reddit.com/r/reddit.com/comments/gv63r/nobody_can_login_or_vote_except_gold/)
. Be sure to read the comments by jedberg, one of the reddit staff.

Edit: Now the top of reddit says:

>UPDATE: We are slowly getting our capacity back and as such allowing a random
subsets of redditors access to the site as we increase capacity. Please check
back soon, as you may be able to log in if you are lucky. Thanks!

------
cygwin98
Maybe it's just me. I feel HN has become very sluggish this morning. Does that
mean lots of redditors come over here?

Edit: Wierd, got downvoted. I do think this is related to the topic, so the
failure of Amazon can have a domino effect, say, AWS fails -> Reddit fails ->
HN sluggish/struggling -> ... Some times I really question the resilience of
the social networks, as most of them may not be designed to handle 2x or 10x
more load.

~~~
mlk
> Does that mean lots of redditors come over here?

I know I did.

------
llambda
It's amazing that AWS has been affected this long, or rather that their
customers have been; here focusing on one big customer, Reddit. But actually
what seems more relevant to me, vis-à-vis sites like Reddit are services such
as Heroku which were or still are affected. Although I can appreciate that
Reddit is an extremely popular site, at the end of the day I'm more concerned
with the development services that were impacted.

------
random42
Btw, has Amazon commented about potential irrecoverable/lost changes/damages,
if any?

------
davidreiss666
They are sort of limping back up now. Some users -- seems to be mostly gold
users right now -- can get in.

~~~
scorpion032
It is reddit. Not experts-exchange. If it were "gold users can now login" that
would be bad.

~~~
ceejayoz
If they have to let users back in slowly (which they do), why shouldn't it be
ones that contributed financially to the site?

~~~
scorpion032
If it has to be based on a metric, it should be based on the Karma. "Who
contributed the most for the site."

~~~
AgentConundrum
They had a bad outage around a year ago which was the first time I saw the
"emergency read-only" notice. Back then, the metric was "age of account". Even
though by that metric, I would have been one of the first users back, I really
don't mind that the Gold users are being let in on a "biased random" basis. As
ceejayoz said, "why shouldn't it be the ones that contributed financially"?

The three metrics mentioned so far all have issues, but I think I like yours
the least.

Gold - "Thanks for helping fund the site. Hopefully your dollars will let us
hire folks that will ensure this doesn't happen again." Age - "Thanks for
sticking by our little site through all of our outages. We know you've had to
put up with a lot, but we appreciate it." Karma - "Thanks for all the cats. We
love cats."

I realize that I'm lumping a lot of users who post great informative content
in with the "better drink my own piss" image macro crowd, but karma just
doesn't seem like the best metric here.

It's also harder to come up with a good threshold for karma. Where do you draw
the line? 10k karma? 100k? Are we only including comment karma or submission
karma? Both?

At least the Gold members have, in general, unambiguously contributed to the
site. I don't have a problem using that as the primary metric here.

(I should note that I'm a reddit plebeian in this regard. I have no golden dog
in this fight.)

~~~
scorpion032
This conversation is rather had there, at reddit. And since this morning, the
response on the reddit itself is much on the lines of how I predicted it.

------
jlampart
If I understand correctly, sites using AWS, and reddit.com for sure, are
geared towards automatically starting and stopping EC2 instances as needed. If
that is the case, why can't they just move everything to a different region,
which hasn't been affected by this disaster?

------
ck2
Is there any proof that Conde Nast is trying to starve-off Reddit so they can
maybe write it off on their taxes? They just don't seem to have a single ounce
of pride in owning it.

It's amazing how much has been done with it's development given how little
they have to work with.

------
cabalamat
I recently set up a website, and did consider AWS to host it. I decided not
to, based on the problems Reddit had been having with AWS. I eventually got a
VPS; looks like I made the right decision.

~~~
enjo
Counterpoint: We've been happily using AWS for several years now. We haven't
had one second of downtime. We host across multiple availability zones, and
have been pretty insulated. We are hosted in the eastern region (which was the
one having issues yesterday), but managed to get through everything ok. We did
lose some instances in one AZ, but failed back to our other instances without
any incident.

ANY provider is going to have issues, the AWS actually gives you a bunch of
nice options to easily handle those. I'm not saying that everyone should use
Amazon, but don't take Reddits issues (I'm not familiar enough with their
architecture to know where the fault really lies) to be the only data point
for AWS reliability.

~~~
bdonlan
If you do use AWS, however, it is vitally important to understand the tools
they provide for redundancy, and their limitations. There are multiple
availability zones for a reason, and although there was some multi-AZ downtime
at the start, that recovered relatively quickly.

------
blhack
I'm kindof curious how they're going to deal with the self-serve ads that were
supposed to run yesterday.

Just a straight refund, or...?

/me patiently waiting for it to come back up so that I can buy some for next
week... :(

~~~
groovylick
jedberg said yesterday he will add a day to the self-serve ads.

------
goalieca
Sadly, my productivity has not improved one bit :(

------
RoadRunner_23
I wonder what the users of Reddit are doing? going for a walk outside and
enjoying the nature?

~~~
simeshev
Maybe this is Amazon's way to tell Reddit addicts to stop and smell the
flowers? :-)

------
lisper
If anyone from reddit is reading this, I tried to sign up for Gold so I could
log in, but I can't because signing up requires you to log in. I suspect I'm
not the only person who would have happily bribed you to get to the head of
the line. I think you guys left a lot of money on the table.

