

Gizzard - Twitter's open source framework for creating distributed datastores - abraham
http://github.com/twitter/gizzard

======
rbranson
THIS is the future of "NoSQL." It's all about custom, distributed datastores.
This is going to make generalized database software look like shrink-wrapped
software sitting next to custom, purpose-built. When done right, there's just
no comparison.

------
cperciva
I think we're going to see more and more motion in this direction. The
problems of block storage, of building structured data out of block storage
(even key-value is a structure!), of partitioning/replication, and of read
caching are fundamentally different, and there's no good reason why they
should all be squashed into the same codebase. (Depending on how partitioning
and replication are done, they are sometimes separate and sometimes need to be
done together.)

------
dacort
Hm, another passing mention of their distributed graph database, FlockDB.

~~~
nkallen
it's coming...

------
rajasaur
From the article: "In order to achieve "eventual consistency", this "retry
later" strategy requires that your write operations are idempotent. This is
because a retry later strategy can apply operations out-of-order (as, for
instance, when newer jobs are applied before older failed jobs are retried)."

What do you do when idempotency is not possible? If you have Relational
databases, how do folks tackle this?

I can understand that it would be excellent for storing search indexes and
non-relational databases though.

~~~
nkallen
It's hard to come up with examples where idempotency is impossible (I'm sure
there are some)... but there are definitely cases where it is difficult.
Counters are one of the most obvious examples; to make them idempotent you
need to jump through a lot of hoops. Usually you assign a transaction-id to
each increment/decrement operation and you keep a log of which have been
applied. Suffice to say this explodes the cost of storing a counter (which
would otherwise only require 32/64 bits).

Other things are hard to make idempotent but it's stil practical. Examples of
this include operations like "delete all rows matching query Q". This either
means "delete all rows for now and forevermore" or "delete all rows that exist
at time T". In either case new rows matching Q might arrive in the future (but
be antedated to the past) and you have to store the operation around in some
way to apply the delete operation _in the future_. This can be easy if your
query is easy to represent, and there is a limited class of such queries.

Sorry, it's hard to be precise about this in comments. The bottom line is
Gizzard is not perfect for everything but idempotency is worth jumping through
hoops a lot of the time (Gizzard or no!)

------
mark_l_watson
Wow, I thought that I was getting pretty good at coding in Scala until I just
spent 30 minutes reading Gizzard's code base. Ugh, now I feel like I have only
been using about 20% of the language.

Good on Twitter for releasing this - looks like good infrastructure code. I
want to look at the Rowz sample application when I get some free time.

------
plq
this sounds and awful lot like dns. i wonder whether they evaluated deploying
this solution using existing dns software. the additional features of this
solution that justify its cost are not obvious from the link.

