The problem wasn't ruby. The problem was the way that Twitter used Ruby. We had one big mono repo with every single function and every form of business logic baked into a single place. That logic relied on monkey patching and all sorts of crazy horrible glue to keep it working together. Every time we had to scale up we would glue infra in place to keep things working while we came up with a real solution (which never really materialized).
In my time there we had memcache instances which held timelines. Populating them took hours/days and while they were unpopulated the site was offline. Rebooting/restarting the caches was simply not an option. We had a data sharding strategy that was temporal. We would spin up a new database cluster every few weeks to handle all of the incoming tweets and failing to spin up a new cluster in time meant we would have a global site outage. Don't even get me started on the "load bearing mac mini".
In reality the only problem rails really contributed on its own was that it could only process a single request per process at a time. Each machine would spin up 16 or 32 processes to handle requests in parallel but each process needed its own connection to the database, to memcache, etc. At one point we had something like 100k processes all trying to talk to a single mysql master. Much of this could have been mitigated by better design of course, but rails encourages models that don't scale up to crazy dimensions.
In reality moderation was virtually impossible because we were in a 24/7 fight with ourselves about how to keep the system alive for the next couple of days. Constant infighting, managerial changes (I had 9 different managers in 3 years), focus changes (we didn't finish the last major site redesign before starting the next one) and a general unwillingness to pause features long enough to stabilize the system meant we were always on the losing end of a infra battle.
Every so often a person would have a brilliant idea on how to solve our scaling issues. They would then disappear into a corner to invent yet-another-bird-themed-datastore. After a few weeks/months they would appear with a magical new thing that would fix all our problems and would make everybody happy. Every single time it would fail.
Having a team that is not the main team design something means that they likely didn't understand the state of the thing that they were replacing. The thing they were replacing was a bucket of edge cases non of which they knew about. The scale never looked like what they expected because in the meantime the load had changed. This was compounded by the constant desire to hire somebody external that could solve the problem for us. They would come in with ego and a feeling that they had a mandate to replace it all. Eventually they would learn just how fragile and complicated the system was, only to then be considered old guard enough to be replaced by the next wave of experts. =/
But the number one killer was that every single thing was baked into the mono repo so it wasn't like they could have just easily shimmed in something to replace the old thing. All the while that they are building in a change to the data store another dev has added 15 new features that they now have to port over. In the time it took to port those over another 20 had been added.. etc.
Just getting the okay to pause feature development was like pulling teeth and it only bought you a few weeks at best.
Can I get this on a T-shirt?
We had an API service, a web interface, the legacy web interface that was still used for select devices because the new UI didn't quite work right on them, the even older legacy interface that was necessary because a bunch of badly behaved early day clients still relied on the functionality and they were popular enough that turning them off would cause outrage, the "zero" interface used in countries with low bandwidth capabilities, the mobile interface.
Each interface had to implement all the different variations on functionality. Timelines with inline tweet rendering (automatic expansion of images, etc), list (alternate view time lines), the whole following graph (duplicated for lists as well), verified users and all the infra around that, search, public/private designations, direct messages, notifications via email, text message, and mobile app, favorites, retweets, replies, plus a slew of statistics and information tracking data integrated directly into the site.. Thats only the user visible stuff. There are a TON of experiments and projects that run behind that interface in a way the user will never completely see.
We heard over and over that twitter was so simple that it could run on a laptop and every time it reminded me just how clueless most developers are when it comes to seeing the body of work needed to make something like twitter work, even more so at the scale we are talking about.