
Manhattan: Real-time, multi-tenant distributed database for Twitter scale - jaboutboul
https://blog.twitter.com/2014/manhattan-our-real-time-multi-tenant-distributed-database-for-twitter-scale
======
ffk
From what I understand, Manhattan is based on the ideas from ElephantDB.
Unfortunately, development has pretty much stopped on ElephantDB despite the
fact a book by Nathan Marz is being written about big data that is dependent
on it. [http://www.manning.com/marz/](http://www.manning.com/marz/)

Summingbird (bear with me, I'll tie this in) is also twitter's answer for
writing code once and seeing it run on a variety of execution platforms such
as hadoop, storm, spark, akka, etc... Not all of these have been built out,
but the platform was designed to be a generic framework to support write once
execute everywhere.

Summingbird is written to support Manhattan's model as well. The high level
idea is to use versioning to determine whether a request is precomputed
(batch), computed (realtime) or a hybrid (precomputed + computed). These are
expressed as monads with basic functionality present in algebird. One way to
bring support to this model to the open source world would be to implement
storehaus bindings for elephantdb and to resurrect elephantdb or build a
similar service to provide storage similar to Manhattan.

Overall, very early yet promising work in the open source community.

[edit: book is not about elephantdb, but is a critical component. modified
wording. Also added link]

~~~
lenn0x
When Manhattan first started, our first use case was this. To support batch
(something we talk about in our blog post). Similar to ElephantDB, the biggest
difference was we wrote our own storage engine (seadb), instead of using
BDBJE. We built the hadoop pipeline service so developers only needed to
supply us with sequence files and we did the work for them by watching.

Over time, Manhattan evolved into a fully fledged read/write database that is
able to support batch + read/write in the same system. Batch is great for some
use cases, but sometimes the cost is too expensive when you factor in how much
processing power, storage, etc is needed for that model. We support both of
course still we want developers to have the freedom to increase their
productivity.

------
caniszczyk
Corresponding Twitter Engineering blog post:
[https://blog.twitter.com/2014/manhattan-our-real-time-
multi-...](https://blog.twitter.com/2014/manhattan-our-real-time-multi-tenant-
distributed-database-for-twitter-scale)

~~~
Nacraile
Yeah, that's much more interesting. The wired article just repeats the
overview and loses the interesting technical details. I'm not sure why it was
posted instead of the original content.

------
teodimoff
I think its not an exaggeration at all. Twitter literally pulls data from
thousands of servers but we are missing the point of Manhattan. As some of us
know twitter services scale dynamic according the load they are serving and
some engineers at Twitter decide to buy a hole truck of steroids to this idea.
Here is a trick - We make container agents (storage services) "clients" of the
Manhattan database(the core). They are Mesos processes which scale dynamic to
the needs of the service (i.e 1 container = 10000 reads/writes per second, 2
containers = 20000 read/writes per second and so on) which allows the dynamic
scaling of requests per second and writes per second. The core handles finding
actual machines which have the data, replicating it and so on. There might be
realtime storage service contaners which need fast data access, batch importer
and timeseries they mention and so on. This requires a lot of gymnastics but
offer a lot of nice features.The Manhattan database acts as virtual layer over
thousands of machines and storage services allows for customized data
manipulation. Cool...huh? According importance and scale (multi dc) this
operates on i think it almost impossible to open source this. But who knows.
miracles are happening now and then.

~~~
teodimoff
Or at least i think that is happening something along those lines.

~~~
teodimoff
And their Alerting system is called Koalabird(picture of manhattan db -
Section Alerting :))

------
iLoch
Interesting.. The database sounds almost too good to be true. I wonder if
they'll open source this. They've done so in the past with projects like
Storm, so I'm hopeful.

~~~
gopalv
Consistency, that's the usual performance killer.

They have got the hadoop watcher right as well as the BTree/SSTable update
mechanisms - heavy import, light update is a use-case that is not catered to
usually.

The rest of it feels a lot like memcache (membase/zbase impls) when reading up
on its architecture - plus SSDs, that's always going to beats the pants off
anything with disks.

Looks good overall, this is probably not for you if want to do exact counters
or match up different counters (i.e +1,+1,+1 for a counter funnel).

If you don't need updates to be consistent, like if you imported hadoop data
in without modification, this starts to look like a really good model.

------
RcouF1uZ4gsC
Can someone enlighten me as to why 6000 tweets a second is something to make a
big deal about? At 140 characters per message that comes out to 840,000
bytes/s < 1 Megabytes per second. In 2014 is a service that can handle 1
Megabytes/s impressive?

~~~
njharman
It's not. Figuring out which of the 240 million accounts they should be sent
to is.

~~~
Retric
I spent a little while trying to implement a twitter back end for fun during
their fail whale period and it's really just a data structure problem. (Edit:
_Figuring out which of the 240 million accounts they should be sent to._ ) You
can literally get the basic freed process running 10k tweets per second with
100 million accounts on a single PC and a 1Gbit Ethernet connection where the
Ethernet connection is the bottleneck. Fanning that out to various places to
listen in is a fairly basic scaling issue.

Not that I took it vary far, but my first stab to see what the scaling issues
would be was actually plenty fast to run there feed process at the time.

PS: The 'trick' is to keep two lists one everything a user follows, and
another is everything that follows each account. For showing the messages to
someone when they log in you keep the last 10 message Id's with timestamp or
sequence ID so if someone follows 5k accounts you can avoid looking at the
vast majority of those messages then sign them up the device to listen to new
messages. (Sure, sometimes you will need to look past that top 10 but it's
rather effective.)

(And for the downvoters I can upload some code if you want to see it.)

~~~
abalone
Ah, hubris. Try simulating some celebrities talking to one another. 15M
followers (Ashton Kutcher) / 10K fanouts per second = 25 minutes to deliver
one tweet.

That's the difference between "getting the basic process running" and
operating at scale.

~~~
Retric
Ehh, first off the 2009 twitter was vary different than the 2014 beast. For
some context: [http://computer.howstuffworks.com/internet/social-
networking...](http://computer.howstuffworks.com/internet/social-
networking/information/5-events-to-crash-twitter.htm)

Back when they where growing from 500,000 users to 7 million total users they
where having major issues and that's when I was looking into things.

Anyway, not I was suggesting 1GB would be fine today. Still, I was saturating
a 1GB connection so 15M (over twice the total users back then) * ~200bytes * 8
/ ~1000^3 = ~24 seconds did you have an extra 60x multiplier in there
somewhere?

------
swah
I don't even look at databases before Aphyr verifies it they keep their
promises...

~~~
coolsunglasses
You'd be better served by using his Jepsen series as an opportunity to learn
how to evaluate and break databases yourself instead of waiting to be
spoonfed.

A $DATABASE_COMPANY was recently looking to hire somebody to be a dedicated
database breaker. The Jepsen series was mentioned in the job post.

Worshipping celebrity is unhealthy in what should be an engineering profession
and is a sign of a deeply embedded pop culture.

Don't be lazy.

~~~
swah
I was kidding.. I never needed to use anything but Postgres.

------
swang
Article calls Gizzard, "strongly consistent" Gizzard's GitHub page says,
"eventually consistent"[1]. What?

[1] [https://github.com/twitter/gizzard](https://github.com/twitter/gizzard)

~~~
lenn0x
Yes, I think the article got that wrong. You are right that Gizzard is
eventually consistent. The storage engine for Gizzard is mySQL, so in the
event you want to do some node-local indexing, you at least have mySQL's
strong consistency, but you won't achieve it in the overall system across
keys.

------
dfcarney
"Real-time" is a bit of a misnomer as far as databases are concerned,
especially if you're talking about a system that defaults to eventual
consistency. I think they would have been better off saying "high-
availability".

~~~
SamReidHughes
High availability isn't the same thing as having consistently low latencies.
Some databases have spiky latency graphs without it being an availability
issue.

------
feelstupid
Does anyone else find the opening statement a little misleading? Yes they
originally come from one place, but they are sent from Twitter to the app of
your choosing via JSON or similar. Sure there's going to be more than one
request for the icon sprite and user avatars, but all from Twitter.

"When you open the Twitter app on your smartphone and all those tweets, links,
icons, photos, and videos materialize in front of you, they’re not coming from
one place. They’re coming from thousands of places."

~~~
Nacraile
It sounds like you're misinterpreting. The way I'm reading it, by "thousands
of places" they mean "thousands of machines". Which may be an exaggeration,
but it should be trivially obvious that data is sharded and must be collated
for display.

------
hoodoof
So this is an internal system right? What is the point of telling the world if
it's not available for anyone to look at or use. Perhaps a recruiting
exercise?

~~~
dfcarney
Google, for one, has been doing this kind of thing for years. Dremel
([http://research.google.com/pubs/pub36632.html](http://research.google.com/pubs/pub36632.html)),
for instance, was the subject of a paper they published after they'd been
developing and using it internally for years. It's never been open sourced,
but there's now Apache Drill
([http://incubator.apache.org/drill/](http://incubator.apache.org/drill/))
which is based on the same paper. In short, it's part marketing, part
recruiting, and part ego. I'm hopeful that Twitter will at least publish a
research paper that digs into some of the nuances of Manhattan and
compares/contrasts against other technologies.

~~~
caniszczyk
There is also Parquet from Twitter which is inspired by the Dremel paper:
[https://blog.twitter.com/2013/dremel-made-simple-with-
parque...](https://blog.twitter.com/2013/dremel-made-simple-with-parquet)

------
haddr
Although I think that the article is interesting, I'm missing some details,
like more engineering stuff, rather than high level details.

~~~
jaboutboul
This is the technical blog post: [https://blog.twitter.com/2014/manhattan-our-
real-time-multi-...](https://blog.twitter.com/2014/manhattan-our-real-time-
multi-tenant-distributed-database-for-twitter-scale)

