
What's Happening with Twitter? - icey
http://blog.twitter.com/2010/06/whats-happening-with-twitter.html
======
chuhnk
Underlying schema or code changes that drastically affect the service's
availability. In terms of mysql you are talking locked tables when there is a
schema change. But then I dont know if they still use mysql, they were talking
about a move to cassandra. At that scale it is never simple. If a change takes
30 seconds on your dev machine it is not going to be that way in production
with a billion rows partitioned across multiple tables.

I am actually curious as to what this "deep" issue is they speak of.

You guys got any examples of infrastructure issues that could cause
availability problems in a large scale environment?

~~~
amix
Schema changes on big tables take hours/days depending on their size. You
don't do this in production. What you normally do in a MySQL setup is to run a
master-master with one active write master. This way you can run optimizations
on one master and switch masters without putting your whole production system
in jeopardy. Read more about Flickr's setup for more details.

There are tons of things that can go wrong in large scale environments
(network issues, load balancing, slow code, databases starting to swap,
databases starting to use a lot of IO etc.)

------
BigZaphod
"Over the next two weeks, we may perform relatively short planned maintenance
on the site. During this time, the service will likely be taken down."

Translation:

"If the site goes down over the next two weeks, just pretend it was all part
of the plan."

~~~
mey
Except they proceeded to indicate which channels you could listen to for
notice of _planned_ downtime.

