

The Basecamp outage / slowdown today - andrewpbrett
http://productblog.37signals.com/products/2009/11/the-basecamp-outage-slowdown-today.html

======
dasht
That would seem to be a fairly deep architectural flaw. Namely, that (roughly
speaking) all users of the basecamp server experience performance and
reliability roughly equal to that of those users experiencing the worst
performance and reliability at any given moment. The guys who depend on the
service for $2^N value per day are in the same boat as those who depend on it
as those who depend on it for $2^(N-K) value. Each group that depends on it
for $2^N value CAN NOT differentiate themselves, reliability, from their
competitors who also rely on it for $2^N value.

That's old news, really. It's well known that that's a problem with SaS,
centralization, and applications without competition. It's well known that it
would be fixed by letting basecamp users install and run the system on servers
under their own control with full software freedom over the thing (trading,
probably, a little bit less reliability for each customer against faster
recoveries for most and no "global outages" and the ability to invest in
hardening the infrastructure for a competitive advantage).... it's just
interesting to see those theoretical defects confirmed in practice.

~~~
zefhous
That doesn't really fix the issue though... Sure, you won't have "global
outages," but I would bet that the overall amount of downtime would be
significantly larger when installed on private servers and not centralized and
managed by 37signals.

I doubt that recoveries would be faster anyway.

~~~
dasht
Recoveries would (likely, for now - less likely a few years from now) be more
often necessary and slower for most users all else being equal, but it's not
that simple.

First, if users have software freedom and are running the system on servers
under their own control, they are free to spend as much as they like, exactly
how they like, to improve reliability. That's the main point of $2^N analysis:
if Alice's Transnational Inc. stands to lose up to $500K on a given day if the
service is out, she can buy insurance in terms of back-up servers (and her
back-up servers only need to serve her users, not every user of the software,
anywhere). Alice is not limited to the same QoS as Bob's Main St. Paint Store.
37signals has to optimize its present and future profitability - not Alice's
value received. There's a misalignment of incentives, there.

Second, if the software were free and generally deployed in a distributed,
decentralized fashion, it's architecture could evolve to be more of a P2P
system such that many different servers would all have to go down at once to
cause more than a slight inconvenience and performance degradation.

------
mrshoe
Why don't they just come clean and apologize for any inconvenience this may
have caused me?

Edit: Sorry, I know HN doesn't encourage one-liners and jokes, but I couldn't
resist this one... I guess you had to be at Startup School to appreciate it.

~~~
sjs382
"Once again, we're really sorry for this. It was very painful to watch
Basecamp be unavailable or slow for that long when we knew what we had to do
to prevent this in the future. Please accept our apologies. Thanks for being a
Basecamp customer."

------
nopal
What caches are they referring to? DB caches? Anyone know what tools they may
be referring to?

~~~
aaronblohowiak
Looks like DB caches indeed.

~~~
nopal
I wouldn't think that most of the queries are executed that often, given that
the "sites" are so unique.

I'd be curious to hear more about how they utilize caching.

------
dabeeeenster
Why is this news?

~~~
thaumaturgy
It isn't. However, it takes only 11 of HN's multitudes of users to decide that
they thought it was interesting, and upvote it, to put it on the front page.

