

Downtime incident today - ubernostrum
https://www.djangoproject.com/weblog/2014/nov/08/downtime-incident-today/

======
jumasheff
Related?
[https://news.ycombinator.com/item?id=8576848](https://news.ycombinator.com/item?id=8576848)

------
octotoad
I haven't gone further than reading TFA, but, was there anything beyond the
tweeted screenshot to suggest that this was an actual "compromise"? Seems like
a quick and easy way to cause quite a large fuss with minimal effort.

~~~
jsmeaton
So leaving potentially compromised packages online while investigating would
have been more appropriate? I can see where you're coming from, but what would
you have recommended instead?

If this was an actual compromise, and they had of remained online for _hours_
while investigating, there'd be lynch mobs out to get them.

~~~
octotoad
Don't get me wrong, they definitely handled this in an appropriate manner.
"Better safe than sorry", but I just wonder about these sorts of situations.
It reminds me of the ridiculous "SWATting" craze; wasting time and resources
on something that turns out to be a false alarm. Good way to stir up a bunch
of hysteria without any heavy lifting.

~~~
bradleyland
That doesn't answer the question: what should they have done instead? We
struggle with this at our company all the time. It's very easy to point out
the shortcomings in any course of action, because these kinds of decisions
almost always have trade-off. As pointed out, the alternative was to leave
potentially compromised packages online. If you do a risk/outcome matrix
analysis on this decision, they clearly made the right one.

------
taspeotis
Speaking of downtime, I can't get to llvm.org.

~~~
jevinskie
[00:59:29] <kavon> seems llvm.org is down. planned maintenance?

[01:06:57] <+chandlerc> kavon: network outage

[01:07:41] <+chandlerc> but Criswell has contacted a grad student at UIUC and
the building with the server in it is having build-wide network issues
apparantly

------
stefantalpalaru
> given the circumstances we believe that taking the Django project's servers
> offline to investigate was the correct response

But that's how you get DOSed with Photoshop and bullshit. There must be a
better policy than disappearing for 2 hours while you investigate.

~~~
obviouslygreen
Why does "we're taking it down while we figure out what's going on" not
qualify as a valid approach? Shouldn't safety be a higher priority than "well
someone might want to look at the home page so we'll put more effort into a
crappy default landing page?"

It's easy (and often correct), in hindsight, to say that some things should be
prepared for in better or different ways. However, that's a different animal
than "crap, something is up, we need to react in the best way we can now."

~~~
stefantalpalaru
> Why does "we're taking it down while we figure out what's going on" not
> qualify as a valid approach?

Because availability is more important than your forensic shenanigans. (2
hours to verify a bunch of checksums? really?)

