

FlockDB: Twitter's distributed, fault-tolerant graph database - qhoxie
http://github.com/twitter/flockdb

======
amix
By the looks on the code this isn't really a graph database. It's more a graph
database emulation built on top of a SQL database. For distribution it uses
sharding so it isn't really distributed either, at least not like Cassandra or
other distributed databases.

I think Redis would perform a lot better than SQL for graph like structures -
since sets are a native datatype in Redis. And you can go A LONG way with just
one Redis database (currently we are storing over 20 million keys in our Redis
database and I know some that are storing 100 million keys on _one_ server).
And with the new Redis VM coming up, I would guess that scalability of Redis
is going to be even better.

Other than this, neo4j seems very interesting and would probably also have
been a better choice than using a relational database.

~~~
lsb
You can go a long way with a SQL database too. I'm writing up an article about
getting the two billion words of Wikipedia into an inverted index in a SQLite
database entirely in memory, and that's another other order of magnitude
bigger.

~~~
amix
If you only know how to use a hammer then everything else looks like a nail.
In other words, re-implementing an inverted index in SQL is waste of time when
you can use tools like Sphinx and Lucene - which are highly optimized to do
inverted indexes and that can easily handle 2 billion words. The same can be
said about FlockDB - it's possible to emulate a graph database, but is the
effort really worth it when there are such tools like Redis and neo4j which
seem to be optimized for graph like structures.

------
al_james
Hmmm... This looks to only store first order relations efficiently, its seems
that to traverse many nodes, you would need to repeatedly query the database
(e.g. I can only get my friends, not the friends of my friends etc...). This
severely limits the use for most problem domains you would want to use a graph
Db for. Still, possibly useful if you have to solve a problem that looks alot
like twitter's.

~~~
wheels
Once you start getting out to second order it becomes a much more complicated
(and interesting) problem -- one that I've been kicking around for a while.

Data-locality is the kicker in a distributed graph database; when doing
traversals that cross multiple nodes you need to have a partitioning scheme
that coordinates with your traversal algorithms so that you need the minimum
number of machine-to-machine hops in a multi-level traversal. Getting that
right is far more difficult than traditional database sharding.

~~~
al_james
Yeah sure.... it is much harder and involves minimizing the number of
relations across shard boundaries. Not easy. However, to call a system that
only allows depth 1 traversals a 'graph database' is slightly pushing the
definition. To me, its more a "key value database with relations between
keys".

Everyone has different requirements though, if depth 1 and huge scale are what
you need, flock db might be for you.

------
hendler
I've yet to set try out Gizzard. Wasn't expecting FlockDB to be released so
soon.

Wondering if FlockDB is truly abstracted from MySQL/Cassandra. And also
wondering how performance compares Neo4j

~~~
nkallen
Note that it's __in the process of __being released: it's as yet unusable by
outsiders. Honestly, I did not expect this to make Hacker News so soon. :(

I have not used Neo4J first hand. It has really cool features, but it is not a
distributed database and has expensive memory usage. FlockDB is distributed,
uses little memory, and has a very limited feature-set that is highly highly
optimized for OLTP. It's not really an apples/apples comparison.
Theoretically, Neo4J could be used as a back-end data-store in FlockDB.

~~~
emileifrem
I agree that it's not apples/apples. From the first few minutes, I think the
main strength of Neo4j is the rich ecosystem and functionality on top of it,
and the fact that it stores an infinite-levels deep graph. In comparison
FlockDB stores one level (e.g. user -> followers). The main strength of
FlockDB seems to be that it has built-in distribution, which is something
we're working on for Neo4j but it's not yet generally available.

All this of course based on just a quick glance, so I may come back all the
wiser and revise my opinion later. :)

-EE [<http://neo4j.org>]

------
labria
I didn't look too deep into the "distributed" features (no docs yet, the code
suggests sharding), but the feature set looks a lot like Redis sets.

~~~
qhoxie
Many (all?) of the distributed features (including the sharding) are part of
gizzard, which it sits on top of.

<http://github.com/twitter/gizzard>

~~~
labria
Makes even less sense to me, then.

~~~
simonw
Redis is less than a year old. I doubt it was a serious contender when Twitter
started building their own solution.

~~~
emileifrem
Well, the data model seems very similar to Redis' from first glances [1], but
FlockDB certainly seems to have completely different durability
characteristics. So even if they started anew today they may end up building
their own.

1] Which would make FlockDB less a graph db and more a key-value store with
social network semantics for the values.

-EE [<http://neo4j.org>]

------
riffraff
I took a loock at the code but I'm not sure i understand one thing: why one
class per file for case classes, scala is much cooler than that :)

------
labria
Scala again? Damn, my bet was on Clojure this time! =)

------
jseifer
The contributors section names four people. If only four people wrote
something like this, that's ridiculously impressive.

~~~
wheels
It's only about 2000 lines of code. More like, "that it took four people to
write this leaves an impression". ;-)

(In all seriousness, no dig on the authors, planning on poking through some of
the source in the next bit.)

~~~
brown9-2
size of source code is not a good measurement of size of achievement

~~~
wheels
I'm just going to post a link to my response to that comment the last time it
came up:

<http://news.ycombinator.com/item?id=1155026>

"Lines of doesn't say anything" is one of those flawed mantras that people
keep repeating as an overreaction to the too often used assumption that it's
the _most_ important metric.

~~~
brown9-2
Not sure if I get your point here. My response to a flawed statement is
flawed?

------
moe
"Fault tolerant" and "Twitter" in the same sentence?

