
In Twitter’s early days, only one celebrity could tweet at a time - evanweaver
https://theoutline.com/post/4147/in-twitters-early-days-only-one-celebrity-could-tweet-at-a-time
======
liquidgecka
If anybody is interested in random Twitter internal stuff I might bang up a
medium post one day. I was neck deep in the infra side of things for years and
have all sorts of funny stories.

Our managed hosting provider wouldn't let us use VPNs or anything that allowed
direct access to the managed network they provided, but we wanted to make
internal only services that were not on the internet so I setup a simple
little system that used DNS to point to private space in the office and a SSH
tunnel to forward the ports to the right places. Worked great, but over time
the internal stuff grew up, and our IT team refused to let me have a server in
the office so it was all running of a pair of mac mini's. We called them the
"load bearing mac minis" since basically 90% of the production management
traffic went over the SSH tunnels they hosted. =)

~~~
lagadu
Posting that here is like showing a big juicy steak to a pit full of hungry
lions: of course we're interested!

------
molecule
2010: "At any moment, Justin Bieber uses 3% of our infrastructure. Racks of
servers are dedicated to him"

[https://gizmodo.com/5632095/justin-bieber-has-dedicated-
serv...](https://gizmodo.com/5632095/justin-bieber-has-dedicated-servers-at-
twitter)

~~~
firebones
But, but, but...why does Twitter have so many engineers? I could write Twitter
in a weekend!

\--95% of anti-TWTR posters circa 2010-2016.

~~~
snovv_crash
Before bring acquired, WhatsApp had what, 30 employees?

How did they do it? I know they used custom BSD servers so that a single box
could keep close to 1M TCP connections open. I'm sure with a fixed target to
aim for and all scope known upfront a small crack team of devs could do
something similar for Twitter.

~~~
zmb_
One-to-one vs. many-to-many messaging. The amount of work you need to do to
deliver a WhatsApp message is constant and small -- just route the message to
a single recipient's mailbox. The amount of work Twitter has to do to deliver
a message grows as a function of followers. One celebrity tweeting another
celebrity means you have to deliver the message to the mailboxes of the
followers of both -- millions of times more work than WhatsApp per message. In
addition, Twitter persists all the messages while WhatsApp doesn't.

~~~
segmondy
This! Everyone keep's saying it's BSD Erlang and jumping on the Erlang train.
Fine tools btw, but WhatsApp is super simple compared to Twitter. _rolls eyes_

------
tschellenbach
If you want to read more about activity feeds there are a ton of papers listed
here: [https://github.com/tschellenbach/stream-
framework](https://github.com/tschellenbach/stream-framework) I've been
working on this stuff for years. Recently I've also enjoyed reading Linkedin's
posts about their feed tech. Three are a few different posts but here's one of
them: [https://engineering.linkedin.com/blog/2016/03/followfeed--
li...](https://engineering.linkedin.com/blog/2016/03/followfeed--linkedin-s-
feed-made-faster-and-smarter)

Scaling a social network is just inherently a very hard problem. Especially if
you have a large userbase with a few very popular users. Stackshare recently
did a nice blogpost about how we at Stream solve this for 300 million users
with Go, RocksDB and Raft: [https://stackshare.io/stream/stream-and-go-news-
feeds-for-ov...](https://stackshare.io/stream/stream-and-go-news-feeds-for-
over-300-million-end-users)

I think the most important part is using a combination of push and pull. So
you keep the most popular users in memory and for the other users you use the
traditional fanout on-write approach. The other thing that helped us scale was
using Go+RocksDB. The throughput is just so much higher compared to
traditional databases like Cassandra.

It's also interesting to note how other companies solved it. Instagram used a
fanout on write approach with Redis, later on Cassandra and eventually a
flavor of Cassandra based on RocksDB. They managed to use a full fanout
approach using a combination of great optimization, a relatively lower posting
volume (compared to Twitter at least) and a ton of VC money.

Friendster and Hyves are two stories of companies that didn't really manage to
solve this and went out of business. (there were probably other factors as
well, but still.) I also heard one investor mention how Tumblr struggled with
technical debt related to their feed. A more recent example is Vero that
basically collapsed under scaling issues.

~~~
ma2rten
I used to work at Hyves. Hyves overcame it's scalability issues, but went out
of business for other reasons. Hyves used MySQL and Memcache similar to
facebook at that time.

By the way, RocksDB which is now Facebook's main database (afaik) is written
on top of LevelDB. So both Google and Facebook run on software written by Jeff
Dean...

~~~
cdoxsey
Its a leveldb fork with substantial changes. For example, a non binary search
file format option: [https://github.com/facebook/rocksdb/wiki/PlainTable-
Format](https://github.com/facebook/rocksdb/wiki/PlainTable-Format)

Pull down the code some time. Everything and the kitchen sink is in there
somewhere. It's a crazy project.

------
collinf
I haven't seen anyone touch on this, but I remember reading about this in Data
Intensive Applications[1]. The way that they solved the celebrity feed issue
was to decouple users with high amounts of followers from normal users.

Here is a quick excerpt, this book is filled to the brim with these gems.

> The final twist of the Twitter anecdote: now that approach 2 is robustly
> implemented,Twitter is moving to a hybrid of both approaches. Most users’
> tweets continue to be fanned out to home timelines at the time when they are
> posted, but a small number of users with a very large number of followers
> (i.e., celebrities) are excepted from this fan-out. Tweets from any
> celebrities that a user may follow are fetched separately and merged with
> that user’s home timeline when it is read, like in approach 1. This hybrid
> approach is able to deliver consistently good performance.

Approach 1 is a global collection of tweets, the tweets are discovered and
merged in that order.

Approach 2 involves posting a tweet from each user into each follower's
timeline, with a cache similar to how a mailbox would work.

[1] [https://www.amazon.com/Designing-Data-Intensive-
Applications...](https://www.amazon.com/Designing-Data-Intensive-Applications-
Reliable-
Maintainable/dp/1449373321/ref=sr_1_1?ie=UTF8&qid=1527213498&sr=8-1&keywords=data+intensive+application&dpID=51PjhtI9VRL&preST=_SX218_BO1,204,203,200_QL40_&dpSrc=srch)

~~~
hinkley
It’s an oft overlooked inequality in these systems. People get so wrapped up
in some whiz bang thing that they don’t stop to think if they should.

At the end of the day, one of the most important aspects of your information
architecture is how many times is each write to the system _actually_
observed? That answer can dictate a lot about your best processing strategies.

~~~
tmd83
This. It took me a while to learn to look at this and eventually focusing less
on caching on some items that don't enough read in it to justify the caching
or pre calculation.

------
gringoDan
This isn't shocking - Twitter was notorious for being held together with
Scotch tape technically.

Honestly this hands-on approach is an impressive example of doing things that
don't scale.

~~~
giancarlostoro
I found it amusing that Twitter was Rails' biggest advertisement. Everyone
wanted to use Rails but Twitter turned into a franken app with different
stacks to keep it running

~~~
phaedryx
Twitter was Rails' worst advertisement. They used Rails as a scapegoat to hide
their bad tech. I still hear things like "Rails can't scale; remember
Twitter?"

~~~
joering2
At least 3 large companies I know my friends developers work at, over the last
5 years switched from Rails to PHP. All told me same story "after twitter,
noone wants to work or touch rails anymore"

~~~
bdcravens
Meanwhile all 3 large companies probably host their code on Github :-)

~~~
treahauet
And even if they don’t, many if not most of their dependencies do.

------
sewercake
There was a fun high-scalability article around their 'fan-out' approaches to
disseminating tweets by popular users etc :
[http://highscalability.com/blog/2013/7/8/the-architecture-
tw...](http://highscalability.com/blog/2013/7/8/the-architecture-twitter-uses-
to-deal-with-150m-active-users.html) .

When I was working on something with similar technical requirements I also
came across this paper ([http://jeffterrace.com/docs/feeding-frenzy-
sigmod10-web.pdf](http://jeffterrace.com/docs/feeding-frenzy-
sigmod10-web.pdf)) that outlined the approach in a more 'formal' manner.

------
anyfoo
Ah, I read that Twitter thread a few days (weeks?) ago and it was much longer.
As far as I remember, it started with someone asking Twitter ops people,
former and current, to share some stories about things that went spectacularly
wrong.

It contained a lot of Twitter ops battle stories, some very interesting. I was
pretty impressed to read Twitter internals in the given level of detail, but
now it seems that the thread that held them all together is protected
(probably didn't expect it to be so popular, or just wanted to continue more
privately).

~~~
liquidgecka
And yet I bet nobody mentioned the "fire rain" in our first data center, the
load bearing mac mini, or the surprise Snoop Dog visit/party! =)

~~~
anyfoo
Hah, I don't think anybody has, but I'd love to hear those stories!

------
city41
In the early days of Twitter the "fail whale" was so common it got assimilated
into culture as a term to use for anytime any site gets overloaded. Nowadays
it seems like that term is "hugged to death"

[https://www.theatlantic.com/technology/archive/2015/01/the-s...](https://www.theatlantic.com/technology/archive/2015/01/the-
story-behind-twitters-fail-whale/384313/)

~~~
liquidgecka
Everybody knows about the "fail whale".. nobody knows about the "moan cone"..
The image is lost to history but it was captured by this account ages ago:
[https://twitter.com/moan_cone](https://twitter.com/moan_cone)

IT was thrown when the system was failing at the rails layer rather than the
apache layer. I believe that Ryan King and I were the last people to ever see
the moan cone in production. =)

~~~
evanweaver
Do you remember the fixit kitten?

~~~
liquidgecka
That got nuked right about the time that I started! I never saw it live.

------
evanweaver
Missed that this went to the front page! I will answer questions if I can.

I am now CEO @ [https://fauna.com/](https://fauna.com/), making whales and
fails a thing of the past for everybody. We are hiring, if you want to work on
distributed databases.

~~~
liquidgecka
Long time no see man! =)

~~~
evanweaver
You too! Hope Zookeeper isn't still keeping you up.

~~~
liquidgecka
Funny thing is that the place I am at now uses Zookeeper.. luckily it has not
been trouble for us though. =)

~~~
evanweaver
It's saving it up for you...remember deleting thousands of Nagios
notifications on the bus? Those were the days.

~~~
liquidgecka
I remember all those iPhone suckers having to download special apps to remove
conversations in the early days since it only allowed you to delete them one
at a time. =)

I also remember getting on a bend to make all fake alerting go away after I
started only to find that like 50% of the configured alerts couldn't ever
succeed and had been paging every 10 minutes for over a year. We got that
stuff sorted out pretty quick. Once we started getting good visibility it
helped us make much better decisions about where the problems actually were.

The first six months felt like everything was just on fire constantly and
nobody know what was going on, but then things started falling in place and it
felt like everything was still on fire, but some people actually had fire
extinguishers for a change. =)

~~~
VectorLock
The best monitoring system was the TV in the Ops area scrolling a constantly
updating search for #failwhale.

------
tschellenbach
Fun fact, Twitter's feed is still kinda broken. If you visit the site after
being gone for a week or so it tells your timeline is empty. It recovers after
a few minutes, but its still a pretty poor user experience.

~~~
donttrack
I always get a message saying the request took too long to execute, if I click
a link leading to Twitter. Have to reload the page a couple of times to make
it work.

Usually don't click on Twitter links for this reason.

~~~
eythian
Yes, I get "rate limited" on a single request on mobile web quite often. It's
strange.

------
logicallee
Say what you will, Twitter managed to make a $25.23B company out of printing

    
    
       char whatever[140];

~~~
tombell
should be 141 if you want to print out a string that would be 140 characters,
NULL termination is a thing.

------
lunchbreak
What frameworks would you use to handle such an steep growth curve? Most
startups I know of start of with rails or the like - and obviously they
couldn't handle the strain. So what would you use?

~~~
tschellenbach
The web part of it is pretty easy to scale. You simply add more web servers.
The problem is in the storage/db layer. I imagine the feed was the primary
challenge scaling Twitter.

Other components such as the search were probably also quite tricky. One thing
i've never figured out is how Facebook handles search on your timeline. That a
seriously complex problem.

Linkedin Recently published a bunch of papers about their feed tech:

[https://engineering.linkedin.com/blog/2016/03/followfeed--
li...](https://engineering.linkedin.com/blog/2016/03/followfeed--linkedin-s-
feed-made-faster-and-smarter)

And Stream's stackshare is also interesting:
[https://stackshare.io/stream/stream-and-go-news-feeds-for-
ov...](https://stackshare.io/stream/stream-and-go-news-feeds-for-
over-300-million-end-users)

~~~
chipperyman573
>One thing i've never figured out is how Facebook handles search on your
timeline. That a seriously complex problem.

From a user's point of view, they don't. I search for things a lot (or at
least, I used to before I realized how bad it is) and things that I _know_ I
saw yesterday don't come up - even if I put in a very specific search.
Sometimes it'll say "no results found", but I'll find it in a tab I hadn't
closed yet.

------
adrianhel
This made me laugh and feel better. Being in a startup is tough!

------
JohnJamesRambo
It would be fascinating if they returned to this capability. One celebrity
gets the podium at a time.

------
iamaelephant
> Jason Goldman, who served as Vice President of Product at Twitter between
> 2007 and 2010, responded to Weaver’s tweets with the observation that early
> Twitter was “held together by sheer force of will.”

I would dispute that, I don't think they can take that much credit. Regardless
of their "sheer force of will" the site was down very, very frequently.

~~~
liquidgecka
Much of that was just bad design decisions in the early days creating a
momentum that was unstoppable and unreplacable at the speed we were growing.

In the early days they implemented a Kitama system for memcache. It worked
great so long as nothing failed since a node coming in and going out could
lead to bad results. In the old days they would just flush caches when that
happened. Later, when memcache was the only way we could serve the loads we
had it became imperative that memcache never be restarted. A single memcache
restarting would overload the mysql backend so bad that the site would be down
for hours. Adding memcache had more or less the same issues though it was a
little easier to prewarm things. We wrote a kernel module that allowed us to
change the ulimits of a running process so we could increase the file
descriptor limits for memcache without restarting it.. Replacing it completely
was damn near impossible given the growth rate and inability to get the data
out of mysql quickly enough.

Us reliability guys worked 16 hour days 7 days a week for years trying to keep
things working well enough to not fail completely. Some of the crazy hacks we
did just to survive were fantastic and impressive and I am still proud and
disgusted by them to this day. =)

~~~
evanweaver
Indeed. "It was amazing that it worked, and terrible that we had to do it."

~~~
liquidgecka
I believe it was James May that said: "You have come up with a clever solution
to a problem you shouldn't have had in the first place!" =)

