

Twitter releases Gizzard, a framework for creating distributed datastores - aviv
http://engineering.twitter.com/2010/04/introducing-gizzard-framework-for.html

======
benologist
With all the downtime and seemingly permanent "Older tweets are temporary
unavailable" messages why would anyone want to build on platforms they've
created?

Maybe they have good ideas but I think part of the reason people are looking
at Cassandra etc is because the high-profile sites using/developing these
platforms _actually work_.

~~~
gfodor
This is such a cynical and short-sighted comment. Did twitter have scalability
problems? Yes. Does this mean that any and all technology that comes out of
the company is junk? On the contrary. If anything, getting battle tested
software that was built in response to solve massive scalability issues is
going to be better than not.

Clearly they've worked through a lot of their problems at this point; I'd
imagine this is some of the code that let them do so.

Nevermind the fact that a reply like this is disrespectful towards people who
are giving you the fruits of their labor _for free_. Shame on you.

~~~
benologist
Maybe they nailed everything with Gizzard, but it'd be more convincing if
their scalability and other problems were actually solved. I'm not sure why
you alluded to those problems in the past tense -
<http://status.twitter.com/>.

If I wanted to be disrespectful I'd have said their problems are still very
real and very often and for that you'd have to be retarded to assume they've
gotten anything but marketing right so far.

Would you build on gizzard right now? Their status blog is full of very recent
reasons why it's premature for them to release anything and even more so why
it's premature to trust or be grateful to them for doing so.

~~~
nkallen
Our uptime could use some improvement, surely. You will bear in mind that it
has improved substantially over the last year. It's largely because of things
like Gizzard...

I think one of the compelling things about Gizzard is that unending list of
crazy and obscure failure conditions that cascade in unpredictable ways and
take the site down--all of those that have occurred up till now are encoded
into the design of Gizzard so that they do not occur again.

We have not fixed those "unknown unknowns", to be revealed in the future, that
can cause a Gizzard system to crash. But my guess is that, over the last year,
we have built in fail-safes for scenarios that exceed anything most people
have ever experienced or ever will experience-- and by a long shot.

I love Cassandra, I think it's awesome. Twitter plans to move its Tweet
storage to Cassandra. Suffice to say that Cassandra is young and the
reliability of Cassandra with our throughput and our (very very large) corpus
can use some improvement. Cassandra is not yet ready for production use at
Twitter despite the fact that it has been deployed successfully at Digg and
elsewhere. The Cassandra community is improving the database very rapidly. In
a year or two I expect Cassandra to be a reasonable option for many popular
web sites. But, even then, Cassandra's design has certain limitations that
might prompt you to design a custom store (or consider alternatives). And
indeed, there is a lot of stuff we have at Twitter that we have no plans to
store in Cassandra unless there are substantial design changes.

An advantage of Gizzard here is that it is more flexible. You could, for
example, build a document-partitioned inverted index with Gizzard such that
you could do local intersections on each shard and merge these intersections
in Gizzard itself. You might, in fact, use Gizzard in front of Lucene to do
just this. There is no way to do that with Cassandra though perhaps they will
build such features in the future. (To anticipate an objection, Lucandra is
not document partitioned so it will suffer from fundamental efficiency
problems for a certain class of search queries.)

~~~
benologist
There's no disputing you guys have massively increased your stability in the
last year, and while growing enormously as well which makes it much more the
victory.

But it would be good if you could go into more detail on your blog about what
parts of twitter are dependent on Gizzard and especially distancing it from
your remaining problems rather than just exclusively talking about how it
works.

~~~
nkallen
I think you've given a good suggestion for upcoming blog posts.

Gizzard is used by two systems at Twitter. One of them is called FlockDB and
we are working on open-sourcing it (indeed, FlockDB is why we open-sourced
Gizzard). FlockDB stores Twitter's social graphs. I cannot tell you (yet) how
many QPS we do do or how many edges there are in the various graphs, but
suffice to say its a lot and we run lots of complicated queries like "who
follows both @aplusk and @oprah but does not want any of @aplusk's retweets,"
etc.

Gizzard is NOT perfect. But we think you'll find it is resilient to a large
class of failure scenarios (that have, in fact, occurred over the last year,
so this is not just the theoretical fault tolerance of a project that's been
deployed on a small web site with simple requirements). With your help, it
could be even better.

------
jluxenberg
If a shard goes down, writes to it are queued on a Gizzard middleware host.
What happens if that Gizzard host is then lost?

~~~
eaceaser
The queues are local to each host so you'd lose those writes, but since writes
are written to a journal on disk, you can apply some redundancy strategy to
that as necessary. Depends on the SLA really.

