

ElephantDB: a distributed database specialized in exporting data from Hadoop - omakase
http://tech.backtype.com/introducing-elephantdb-a-distributed-database

======
yummyfajitas
This is probably a dumb question, but what does ElephantDB offer over HBase?
It looks a lot simpler, but read only and with far fewer features.

~~~
nathanmarz
This is a good question.

The answer is simplicity and robustness. HBase is a very complex system. It's
a monster to configure and operate. If you don't need random write
capabilities and a K/V data model is sufficient for you, HBase may be
overkill.

ElephantDB has very little moving parts, so it "just works." Additionally, if
you make an update and write bad data to ElephantDB, you can easily delete the
bad version and roll back to a version with good data.

At the end of the day, they target different use cases.

~~~
yummyfajitas
_Additionally, if you make an update and write bad data to ElephantDB, you can
easily delete the bad version and roll back to a version with good data._

To be fair, HBase has this too. As long as you don't overwrite 3 times
(configurable, on a per-column family basis), you are fine.

I must say that an alternative to HBase which just works is mighty tempting.
I'm using HBase, and about half of the nasty problems I've run into have been
caused by either HBase directly or making HBase talk to Django (the thrift
interface works nicely in batch jobs, but causes all sorts of problems in a
webapp). Will definitely check out elephant.

~~~
nathanmarz
There's a few important differences between HBase versioning and ElephantDB
versioning. The HBase versioning is more of a "buffer" that gives you a
limited window to rollback when you make a mistake. As you said. if you
overwrite too many times you can't recover.

The ElephantDB versions are totally independent from one another, whereas with
HBase they're stored in the same index. The only tradeoff to storing more
ElephantDB versions is using more space on the distributed filesystem.
ElephantDB versions don't go away until you delete them, which lets you use
whatever strategy you want to manage versions. For example, you may decide to
keep one version each month, and every version for the past week. This lets
you do analytics that looks at your data from long ago.

------
jbellis
Link to the last time this was posted:
<http://news.ycombinator.com/item?id=2237158>

------
wtn
I was going to snarkily say that they named their database after PostgreSQL's
mascot, but then I remembered that Hadoop also has an elephant logo.

~~~
zmitri
there's also a CL db named elephant <http://common-lisp.net/project/elephant/>

